Mapping pipeline skeletons onto heterogeneous platforms
|
|
- Eustace McKinney
- 5 years ago
- Views:
Transcription
1 Mapping pipeline skeletons onto heterogeneous platforms Anne Benoit and Yves Robert GRAAL team, LIP École Normale Supérieure de Lyon January 2007 January 2007 Mapping pipeline skeletons MAO 1/ 41
2 Introduction and motivation Mapping applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping Mapping pipeline skeletons onto heterogeneous platforms January 2007 Mapping pipeline skeletons MAO 2/ 41
3 Introduction and motivation Mapping applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping Mapping pipeline skeletons onto heterogeneous platforms January 2007 Mapping pipeline skeletons MAO 2/ 41
4 Introduction and motivation Mapping applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping Mapping pipeline skeletons onto heterogeneous platforms January 2007 Mapping pipeline skeletons MAO 2/ 41
5 Introduction and motivation Mapping applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping Mapping pipeline skeletons onto heterogeneous platforms January 2007 Mapping pipeline skeletons MAO 2/ 41
6 Introduction and motivation Mapping applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping Mapping pipeline skeletons onto heterogeneous platforms January 2007 Mapping pipeline skeletons MAO 2/ 41
7 Introduction and motivation Mapping applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping Mapping pipeline skeletons onto heterogeneous platforms January 2007 Mapping pipeline skeletons MAO 2/ 41
8 Introduction and motivation Mapping applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping Mapping pipeline skeletons onto heterogeneous platforms January 2007 Mapping pipeline skeletons MAO 2/ 41
9 Why restrict to pipelines? Chains-on-chains partitioning problem - no communications - identical processors Extensions (done) - with communications - with heterogeneous processors/links - goal: assess complexity, design heuristics Extensions (current work) - deal with DEALs - deal with DAGs Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 3/ 41
10 Why restrict to pipelines? Chains-on-chains partitioning problem - no communications - identical processors Extensions (done) - with communications - with heterogeneous processors/links - goal: assess complexity, design heuristics Extensions (current work) - deal with DEALs - deal with DAGs Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 3/ 41
11 Why restrict to pipelines? Chains-on-chains partitioning problem - no communications - identical processors Extensions (done) - with communications - with heterogeneous processors/links - goal: assess complexity, design heuristics Extensions (current work) - deal with DEALs - deal with DAGs Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 3/ 41
12 Chains-on-chains Load-balance contiguous tasks Back to Bokhari and Iqbal partitioning papers See survey by Pinar and Aykanat, JPDC 64, 8 (2004) Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 4/ 41
13 Chains-on-chains Load-balance contiguous tasks With p = 4 processors? Back to Bokhari and Iqbal partitioning papers See survey by Pinar and Aykanat, JPDC 64, 8 (2004) Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 4/ 41
14 Chains-on-chains Load-balance contiguous tasks With p = 4 processors? Back to Bokhari and Iqbal partitioning papers See survey by Pinar and Aykanat, JPDC 64, 8 (2004) Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 4/ 41
15 Chains-on-chains Load-balance contiguous tasks With p = 4 processors? Back to Bokhari and Iqbal partitioning papers See survey by Pinar and Aykanat, JPDC 64, 8 (2004) Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 4/ 41
16 Rule of the game Map each pipeline stage on a single processor (no deals) Goal: minimize execution time Several mapping strategies S 1 S 2... S k... S n The pipeline application Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 5/ 41
17 Rule of the game Map each pipeline stage on a single processor (no deals) Goal: minimize execution time Several mapping strategies S 1 S 2... S k... S n The pipeline application Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 5/ 41
18 Rule of the game Map each pipeline stage on a single processor (no deals) Goal: minimize execution time Several mapping strategies S 1 S 2... S k... S n One-to-one Mapping Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 5/ 41
19 Rule of the game Map each pipeline stage on a single processor (no deals) Goal: minimize execution time Several mapping strategies S 1 S 2... S k... S n Interval Mapping Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 5/ 41
20 Rule of the game Map each pipeline stage on a single processor (no deals) Goal: minimize execution time Several mapping strategies S 1 S 2... S k... S n General Mapping Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 5/ 41
21 Major contributions Theory Formal approach to the problem Problem complexity Integer linear program for exact resolution Practice Heuristics for Interval Mapping on clusters Experiments to compare heuristics and evaluate their absolute performance January 2007 Mapping pipeline skeletons MAO 6/ 41
22 Major contributions Theory Formal approach to the problem Problem complexity Integer linear program for exact resolution Practice Heuristics for Interval Mapping on clusters Experiments to compare heuristics and evaluate their absolute performance January 2007 Mapping pipeline skeletons MAO 6/ 41
23 Outline 1 Framework 2 Complexity results 3 Heuristics 4 Experiments 5 Linear programming formulation 6 Conclusion Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 7/ 41
24 The application δ 0 δ 1 δ k 1 δ k δ n S 1 S 2 S k S n w 1 w 2 w k w n n stages S k, 1 k n S k : receives input of size δ k 1 from S k 1 performs w k computations outputs data of size δ k to S k+1 S 0 and S n+1 : virtual stages representing the outside world Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 8/ 41
25 The platform s in P in P out s out b in,u b v,out P u s u b u,v P v s v p processors P u, 1 u p, fully interconnected s u : speed of processor P u bidirectional link link u,v : P u P v, bandwidth b u,v one-port model: each processor can either send, receive or compute at any time-step P in : input data P out : output data Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 9/ 41
26 Different platforms Fully Homogeneous Identical processors (s u = s) and links (b u,v = b): typical parallel machines Communication Homogeneous Different-speed processors (s u s v ), identical links (b u,v = b): networks of workstations, clusters Fully Heterogeneous Fully heterogeneous architectures, s u s v and b u,v b u,v : hierarchical platforms, grids Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 10/ 41
27 Mapping problem: One-to-one Mapping n p: map each stage S k onto a distinct processor P alloc(k) Period of P alloc(k) : minimum delay between processing of two consecutive tasks S k 1 S k S k+1 t = alloc(k 1) u = alloc(k) v = alloc(k + 1) Cycle-time of P u : cycle u = δ k 1 b t,u + w k s u + δ k b u,v Optimization problem: find the allocation function alloc : [1, n] [1, p] which minimizes T period = max 1 k n cycle alloc(k) (with alloc(0) = in and alloc(n + 1) = out) Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 11/ 41
28 Mapping problem: One-to-one Mapping n p: map each stage S k onto a distinct processor P alloc(k) Period of P alloc(k) : minimum delay between processing of two consecutive tasks S k 1 S k S k+1 t = alloc(k 1) u = alloc(k) v = alloc(k + 1) Cycle-time of P u : cycle u = δ k 1 b t,u + w k s u + δ k b u,v Optimization problem: find the allocation function alloc : [1, n] [1, p] which minimizes T period = max 1 k n cycle alloc(k) (with alloc(0) = in and alloc(n + 1) = out) Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 11/ 41
29 Mapping problem: Interval Mapping Several consecutive stages onto the same processor Increase computational load, reduce communications Mandatory when p < n Partition of [1..n] into m intervals I j = [d j, e j ] (with d j e j for 1 j m, d 1 = 1, d j+1 = e j + 1 for 1 j m 1 and e m = n) Interval I j mapped onto processor P alloc(j) T period = max 1 j m { δ dj 1 b alloc(j 1),alloc(j) + ej i=d j w i s alloc(j) + δ ej b alloc(j),alloc(j+1) } Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 12/ 41
30 Mapping problem: Interval Mapping Several consecutive stages onto the same processor Increase computational load, reduce communications Mandatory when p < n Partition of [1..n] into m intervals I j = [d j, e j ] (with d j e j for 1 j m, d 1 = 1, d j+1 = e j + 1 for 1 j m 1 and e m = n) Interval I j mapped onto processor P alloc(j) T period = max 1 j m { δ dj 1 b alloc(j 1),alloc(j) + ej i=d j w i s alloc(j) + δ ej b alloc(j),alloc(j+1) } Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 12/ 41
31 Mapping problem: Interval Mapping Several consecutive stages onto the same processor Increase computational load, reduce communications Mandatory when p < n Partition of [1..n] into m intervals I j = [d j, e j ] (with d j e j for 1 j m, d 1 = 1, d j+1 = e j + 1 for 1 j m 1 and e m = n) Interval I j mapped onto processor P alloc(j) T period = max 1 j m { δ dj 1 b alloc(j 1),alloc(j) + ej i=d j w i s alloc(j) + δ ej b alloc(j),alloc(j+1) } Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 12/ 41
32 Mapping problem: General Mapping Not suiting the one-port model very well: can always be replaced by an Interval Mapping as good as the general one for Communication Homogeneous platforms Can be the optimal mapping for Fully Heterogeneous platforms in some particular cases More general, but requires threads and may lead to idle times and races with the one-port model January 2007 Mapping pipeline skeletons MAO 13/ 41
33 Outline 1 Framework 2 Complexity results 3 Heuristics 4 Experiments 5 Linear programming formulation 6 Conclusion Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 14/ 41
34 Complexity results One-to-one Mapping Interval Mapping General Mapping Fully Hom. Comm. Hom. January 2007 Mapping pipeline skeletons MAO 15/ 41
35 Complexity results Fully Hom. Comm. Hom. One-to-one Mapping polynomial polynomial Interval Mapping General Mapping Binary search polynomial algorithm for One-to-one Mapping January 2007 Mapping pipeline skeletons MAO 15/ 41
36 Complexity results Fully Hom. Comm. Hom. One-to-one Mapping polynomial polynomial Interval Mapping polynomial NP-complete General Mapping Binary search polynomial algorithm for One-to-one Mapping Dynamic programming algorithm for Interval Mapping on Hom. platforms January 2007 Mapping pipeline skeletons MAO 15/ 41
37 Complexity results Fully Hom. Comm. Hom. One-to-one Mapping polynomial polynomial Interval Mapping polynomial NP-complete General Mapping same complexity as Interval Binary search polynomial algorithm for One-to-one Mapping Dynamic programming algorithm for Interval Mapping on Hom. platforms General mapping: same complexity as Interval Mapping January 2007 Mapping pipeline skeletons MAO 15/ 41
38 Complexity results Fully Hom. Comm. Hom. One-to-one Mapping polynomial polynomial Interval Mapping polynomial NP-complete General Mapping same complexity as Interval Binary search polynomial algorithm for One-to-one Mapping Dynamic programming algorithm for Interval Mapping on Hom. platforms General mapping: same complexity as Interval Mapping All problem instances NP-complete on Fully Heterogeneous platforms January 2007 Mapping pipeline skeletons MAO 15/ 41
39 Back to chains-on-chains Chains-on-chains + homogeneous communications: polynomial Chains-on-chains + different-speed processors: NP-complete Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 16/ 41
40 Back to chains-on-chains Chains-on-chains + homogeneous communications: polynomial Chains-on-chains + different-speed processors: NP-complete Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 16/ 41
41 One-to-one/Comm. Hom.: binary search algorithm Work with fastest n processors, numbered P 1 to P n, where s 1 s 2... s n Mark all stages S 1 to S n as free For u = 1 to n Pick up any free stage S k s.t. δ k 1 /b + w k /s u + δ k /b T period Assign S k to P u, and mark S k as already assigned If no stage found return failure Proof: exchange argument Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 17/ 41
42 One-to-one/Comm. Hom.: binary search algorithm Work with fastest n processors, numbered P 1 to P n, where s 1 s 2... s n Mark all stages S 1 to S n as free For u = 1 to n Pick up any free stage S k s.t. δ k 1 /b + w k /s u + δ k /b T period Assign S k to P u, and mark S k as already assigned If no stage found return failure Proof: exchange argument Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 17/ 41
43 Interval, Fully Hom.: dynamic programming algorithm c(i, j, k): optimal period to map stages S i to S j using exactly k processors Goal: min 1 k p c(1, n, k) { c(i, j, k) = min q + r = k 1 q k 1 1 r k 1 min {max (c(i, l, q), c(l + 1, j, r))} i l j 1 c(i, j, 1) = δ j i 1 b + k=i w k + δ j s b c(i, j, k) = + if k > j i + 1 Proof: search over all possible partitionings into two subintervals, using every possible number of processors for each interval Complexity: O(n 3 p 2 ) Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 18/ 41 }
44 Interval, Fully Hom.: dynamic programming algorithm c(i, j, k): optimal period to map stages S i to S j using exactly k processors Goal: min 1 k p c(1, n, k) { c(i, j, k) = min q + r = k 1 q k 1 1 r k 1 min {max (c(i, l, q), c(l + 1, j, r))} i l j 1 c(i, j, 1) = δ j i 1 b + k=i w k + δ j s b c(i, j, k) = + if k > j i + 1 Proof: search over all possible partitionings into two subintervals, using every possible number of processors for each interval Complexity: O(n 3 p 2 ) Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 18/ 41 }
45 Interval, Fully Hom.: dynamic programming algorithm c(i, j, k): optimal period to map stages S i to S j using exactly k processors Goal: min 1 k p c(1, n, k) { c(i, j, k) = min q + r = k 1 q k 1 1 r k 1 min {max (c(i, l, q), c(l + 1, j, r))} i l j 1 c(i, j, 1) = δ j i 1 b + k=i w k + δ j s b c(i, j, k) = + if k > j i + 1 Proof: search over all possible partitionings into two subintervals, using every possible number of processors for each interval Complexity: O(n 3 p 2 ) Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 18/ 41 }
46 Interval, Fully Hom.: dynamic programming algorithm c(i, j, k): optimal period to map stages S i to S j using exactly k processors Goal: min 1 k p c(1, n, k) { c(i, j, k) = min q + r = k 1 q k 1 1 r k 1 min {max (c(i, l, q), c(l + 1, j, r))} i l j 1 c(i, j, 1) = δ j i 1 b + k=i w k + δ j s b c(i, j, k) = + if k > j i + 1 Proof: search over all possible partitionings into two subintervals, using every possible number of processors for each interval Complexity: O(n 3 p 2 ) Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 18/ 41 }
47 Heterogeneous platforms: NP-complete Reduction from MINIMUM METRIC BOTTLENECK WANDERING SALESPERSON PROBLEM 1 P in P 1 2 d(c1,c2) 2 d(c1,c4) 2 d(c1,c3) P 1,2 P 1,3 P 1,4 2 d(c1,c2) 2 d(c1,c3) 2 d(c1,c4) P 2 P 3 P 4 1 P 2,3 P 3,4 P 2,4 P out 2 d(c2,c4) 2 d(c2,c4) Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 19/ 41
48 Outline 1 Framework 2 Complexity results 3 Heuristics 4 Experiments 5 Linear programming formulation 6 Conclusion Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 20/ 41
49 Greedy heuristics (1/2) Target clusters: Communication Homogeneous platforms and Interval Mapping L = n/p consecutive stages per processor: set of intervals fixed n/l processors used One-to-one Mapping when n p (L = 1) Rule applied in all greedy heuristics except random interval length Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 21/ 41
50 Greedy heuristics (2/2) H1a-GR: random Random choice of a free processor for each interval H1b-GRIL: random interval length Idem with random interval sizes: average length L, 1 length 2L 1 H2-GSW: biggest w Place interval with most computations on fastest processor H3-GSD: biggest δ in + δ out Intervals are sorted by communications (δ in + δ out ) in: first stage of interval; out 1: last one H4-GP: biggest period on fastest processor Balancing computation and communication: processors sorted by decreasing speed s u ; for current processor u, choose interval with biggest period (δ in + δ out )/b + i Interval w i/s u Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 22/ 41
51 Sophisticated heuristics H5-BS121: binary search for One-to-one Mapping optimal algorithm for One-to-one Mapping. When p < n, application cut in fixed intervals of length L. H6-SPL: splitting intervals Processors sorted by decreasing speed, all stages to first processor. At each step, select used proc j with largest period, split its interval (give fraction of stages to j ): minimize max(period(j), period(j )) and split if maximum period improved. H7a-BSL and H7b-BSC: binary search (longest/closest) Binary search on period P: start with stage s = 1, build intervals (s, s ) fitting on processors. For each u, and each s s, compute period (s..s, u) and check whether it is smaller than P. H7a: maximizes s ; H7b: chooses the closest period. Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 23/ 41
52 Outline 1 Framework 2 Complexity results 3 Heuristics 4 Experiments 5 Linear programming formulation 6 Conclusion Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 24/ 41
53 Plan of experiments Assess performance of polynomial heuristics Random applications, n = 1 to 50 stages Random platforms, p = 10 and p = 100 processors b = 10 (comm. hom.), proc. speed between 1 and 20 Relevant parameters: ratios δ b and w s Average over 100 similar random appli/platform pairs Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 25/ 41
54 Plan of experiments Assess performance of polynomial heuristics Random applications, n = 1 to 50 stages Random platforms, p = 10 and p = 100 processors b = 10 (comm. hom.), proc. speed between 1 and 20 Relevant parameters: ratios δ b and w s Average over 100 similar random appli/platform pairs Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 25/ 41
55 Plan of experiments Assess performance of polynomial heuristics Random applications, n = 1 to 50 stages Random platforms, p = 10 and p = 100 processors b = 10 (comm. hom.), proc. speed between 1 and 20 Relevant parameters: ratios δ b and w s Average over 100 similar random appli/platform pairs Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 25/ 41
56 Experiment 1 - balanced comm/comp, hom comm δ i = 10, computation time between 1 and processors 100 processors Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 26/ 41
57 Experiment 1 - balanced comm/comp, hom comm δ i = 10, computation time between 1 and processors 100 processors H1a-GreedyRandom H1b-GreedyRandomIntervalLength H2-GreedySumW H3-GreedySumDinDout H4-GreedyPeriod H5-BinarySearch1to1 H6-SPLitting H7a-BinarySearchLongest H7b-BinarySearchClosest Maximum period Yves.Robert@ens-lyon.fr January 2007 Number of stages Mapping (p=10) pipeline skeletons MAO 26/ 41
58 Experiment 1 - balanced comm/comp, hom comm δ i = 10, computation time between 1 and processors 100 processors H1a-GreedyRandom H1b-GreedyRandomIntervalLength H2-GreedySumW H3-GreedySumDinDout H4-GreedyPeriod H5-BinarySearch1to1 H6-SPLitting H7a-BinarySearchLongest H7b-BinarySearchClosest Maximum period Yves.Robert@ens-lyon.fr January 2007 Number of stages Mapping (p=100) pipeline skeletons MAO 26/ 41
59 Experiment 2 - balanced comm/comp, het comm communication time between 1 and 100 computation time between 1 and 20 Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 27/ 41
60 Experiment 2 - balanced comm/comp, het comm communication time between 1 and 100 computation time between 1 and H1a-GreedyRandom H1b-GreedyRandomIntervalLength H2-GreedySumW H3-GreedySumDinDout H4-GreedyPeriod H5-BinarySearch1to1 H6-SPLitting H7a-BinarySearchLongest H7b-BinarySearchClosest Maximum period Number of stages (p=10) Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 27/ 41
61 Experiment 2 - balanced comm/comp, het comm communication time between 1 and 100 computation time between 1 and H1a-GreedyRandom H1b-GreedyRandomIntervalLength H2-GreedySumW H3-GreedySumDinDout H4-GreedyPeriod H5-BinarySearch1to1 H6-SPLitting H7a-BinarySearchLongest H7b-BinarySearchClosest Maximum period Number of stages (p=100) Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 27/ 41
62 Experiment 3 - large computations communication time between 1 and 20 computation time between 10 and 1000 Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 28/ 41
63 Experiment 3 - large computations communication time between 1 and 20 computation time between 10 and H1a-GreedyRandom H1b-GreedyRandomIntervalLength H2-GreedySumW H3-GreedySumDinDout H4-GreedyPeriod H5-BinarySearch1to1 H6-SPLitting H7a-BinarySearchLongest H7b-BinarySearchClosest Maximum period Number of stages (p=10) Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 28/ 41
64 Experiment 3 - large computations communication time between 1 and 20 computation time between 10 and H1a-GreedyRandom H1b-GreedyRandomIntervalLength H2-GreedySumW H3-GreedySumDinDout H4-GreedyPeriod H5-BinarySearch1to1 H6-SPLitting H7a-BinarySearchLongest H7b-BinarySearchClosest Maximum period Number of stages (p=100) Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 28/ 41
65 Experiment 4 - computations communication time between 1 and 20 computation time between 0.01 and 10 Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 29/ 41
66 Experiment 4 - computations communication time between 1 and 20 computation time between 0.01 and H1a-GreedyRandom H1b-GreedyRandomIntervalLength H2-GreedySumW H3-GreedySumDinDout H4-GreedyPeriod H5-BinarySearch1to1 H6-SPLitting H7a-BinarySearchLongest H7b-BinarySearchClosest Maximum period Number of stages (p=10) Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 29/ 41
67 Experiment 4 - computations communication time between 1 and 20 computation time between 0.01 and H1a-GreedyRandom H1b-GreedyRandomIntervalLength H2-GreedySumW H3-GreedySumDinDout H4-GreedyPeriod H5-BinarySearch1to1 H6-SPLitting H7a-BinarySearchLongest H7b-BinarySearchClosest Maximum period Number of stages (p=100) Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 29/ 41
68 Summary of experiments Much more efficient than random mappings Three dominant heuristics for different cases Insignificant communications (hom. or small) and many processors: H5-BS121 (One-to-one Mapping) Insignificant communications (hom. or small) and few processors: H7b-BSC (clever choice where to split) Important communications (het. or big): H6-SPL (splitting choice relevant for any number of processors) January 2007 Mapping pipeline skeletons MAO 30/ 41
69 Summary of experiments Much more efficient than random mappings Three dominant heuristics for different cases Insignificant communications (hom. or small) and many processors: H5-BS121 (One-to-one Mapping) Insignificant communications (hom. or small) and few processors: H7b-BSC (clever choice where to split) Important communications (het. or big): H6-SPL (splitting choice relevant for any number of processors) January 2007 Mapping pipeline skeletons MAO 30/ 41
70 Outline 1 Framework 2 Complexity results 3 Heuristics 4 Experiments 5 Linear programming formulation 6 Conclusion Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 31/ 41
71 Integer linear programming Integer LP to solve Interval Mapping on Fully Heterogeneous platforms Many integer variables: no efficient algorithm to solve Approach limited to small problem instances Absolute performance of the heuristics for such instances January 2007 Mapping pipeline skeletons MAO 32/ 41
72 Linear program: variables x k,u : 1 if S k on P u (0 otherwise) y k,u : 1 if S k and S k+1 both on P u (0 otherwise) z k,u,v : 1 if S k on P u and S k+1 on P v (0 otherwise) first u and last u : integer denoting first and last stage assigned to P u (to enforce interval constraints) T period : period of the pipeline Objective function: minimize T period Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 33/ 41
73 Linear program: variables x k,u : 1 if S k on P u (0 otherwise) y k,u : 1 if S k and S k+1 both on P u (0 otherwise) z k,u,v : 1 if S k on P u and S k+1 on P v (0 otherwise) first u and last u : integer denoting first and last stage assigned to P u (to enforce interval constraints) T period : period of the pipeline Objective function: minimize T period Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 33/ 41
74 Linear program: variables x k,u : 1 if S k on P u (0 otherwise) y k,u : 1 if S k and S k+1 both on P u (0 otherwise) z k,u,v : 1 if S k on P u and S k+1 on P v (0 otherwise) first u and last u : integer denoting first and last stage assigned to P u (to enforce interval constraints) T period : period of the pipeline Objective function: minimize T period Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 33/ 41
75 Linear program: constraints k [0..n + 1], u x k,u = 1 k [0..n], u v z k,u,v + u y k,u = 1 k [0..n], u, v [1..p] {in, out}, u v, x k,u +x k+1,v 1+z k,u,v k [0..n], u [1..p] {in, out}, x k,u + x k+1,u 1 + y k,u k [1..n], u [1..p], first u k.x k,u + n.(1 x k,u ) k [1..n], u [1..p], last u k.x k,u k [1..n 1], u, v [1..p], u v, last u k.z k,u,v + n.(1 z k,u,v ) k [1..n 1], u, v [1..p], u v, first v (k + 1).z k,u,v nx < u X δ k 1 z : k 1,t,u A + w k x k,u X δ = k z k,u,v A b k=1 t u t,u s u b v u u,v ; T period Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 34/ 41
76 Linear program: constraints k [0..n + 1], u x k,u = 1 k [0..n], u v z k,u,v + u y k,u = 1 k [0..n], u, v [1..p] {in, out}, u v, x k,u +x k+1,v 1+z k,u,v k [0..n], u [1..p] {in, out}, x k,u + x k+1,u 1 + y k,u k [1..n], u [1..p], first u k.x k,u + n.(1 x k,u ) k [1..n], u [1..p], last u k.x k,u k [1..n 1], u, v [1..p], u v, last u k.z k,u,v + n.(1 z k,u,v ) k [1..n 1], u, v [1..p], u v, first v (k + 1).z k,u,v nx < u X δ k 1 z : k 1,t,u A + w k x k,u X δ = k z k,u,v A b k=1 t u t,u s u b v u u,v ; T period Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 34/ 41
77 Linear program: constraints k [0..n + 1], u x k,u = 1 k [0..n], u v z k,u,v + u y k,u = 1 k [0..n], u, v [1..p] {in, out}, u v, x k,u +x k+1,v 1+z k,u,v k [0..n], u [1..p] {in, out}, x k,u + x k+1,u 1 + y k,u k [1..n], u [1..p], first u k.x k,u + n.(1 x k,u ) k [1..n], u [1..p], last u k.x k,u k [1..n 1], u, v [1..p], u v, last u k.z k,u,v + n.(1 z k,u,v ) k [1..n 1], u, v [1..p], u v, first v (k + 1).z k,u,v nx < u X δ k 1 z : k 1,t,u A + w k x k,u X δ = k z k,u,v A b k=1 t u t,u s u b v u u,v ; T period Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 34/ 41
78 Linear program: experiments O(np 2 ) variables, as many constraints Experiments only on small problem instances Average over 10 instances of each application Use GLPK Largest experiment: p = 8, n = 4: 14-hour computation time Parameters similar to Experiment 1: homogeneous communications and balanced comm/comp Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 35/ 41
79 Linear program: experiments O(np 2 ) variables, as many constraints Experiments only on small problem instances Average over 10 instances of each application Use GLPK Largest experiment: p = 8, n = 4: 14-hour computation time Parameters similar to Experiment 1: homogeneous communications and balanced comm/comp Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 35/ 41
80 Linear program: experiment p = Linear Program H3-GreedySumDinDout H5-BinarySearch1to1 H6-SPLitting H7a-BinarySearchLongest H7b-BinarySearchClosest 3.2 Maximum period Number of stages (p=8) Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 36/ 41
81 Linear program: experiment p = 8 n LP H5-BS121 H7b-BSC Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 36/ 41
82 Linear program: experiment p = 4 Homogeneous communications (Experiment 1) Linear Program H3-GreedySumDinDout H5-BinarySearch1to1 H6-SPLitting H7a-BinarySearchLongest H7b-BinarySearchClosest 4.5 Maximum period Number of stages (p=4) - Homogeneous case H7b very close to the optimal (< 3% error) Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 37/ 41
83 Linear program: experiment p = 4 Heterogeneous communications (Experiment 2) Linear Program H3-GreedySumDinDout H5-BinarySearch1to1 H6-SPLitting H7a-BinarySearchLongest H7b-BinarySearchClosest 14 Maximum period Number of stages (p=4) - Heterogeneous case H6 very close to the optimal (< 0.05% error) Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 37/ 41
84 Outline 1 Framework 2 Complexity results 3 Heuristics 4 Experiments 5 Linear programming formulation 6 Conclusion Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 38/ 41
85 Related work Scheduling task graphs on heterogeneous platforms Acyclic task graphs scheduled on different speed processors [Topcuoglu et al.]. Communication contention: 1-port model [Beaumont et al.]. Mapping pipelined computations onto special-purpose architectures FPGA arrays [Fabiani et al.]. Fault-tolerance for embedded systems [Zhu et al.] Mapping pipelined computations onto clusters and grids DAG [Taura et al.], DataCutter [Saltz et al.] Mapping skeletons onto clusters and grids Use of stochastic process algebra [Benoit et al.] January 2007 Mapping pipeline skeletons MAO 39/ 41
86 Conclusion Theoretical side Complexity for different mapping strategies and different platform types Practical side Optimal polynomial algorithm for One-to-one Mapping Design of several heuristics for Interval Mapping on Communication Homogeneous Comparison of their performance Linear program to assess the absolute performance of the heuristics, which turns out to be quite good January 2007 Mapping pipeline skeletons MAO 40/ 41
87 Future work Short term Longer term Heuristics for Fully Heterogeneous platforms Extension to DAG-trees (a DAG which is a tree when un-oriented) Extension to stage replication LP with replication and DAG-trees Real experiments on heterogeneous clusters, using an already-implemented skeleton library and MPI Comparison of effective performance against theoretical performance Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 41/ 41
Complexity results for throughput and latency optimization of replicated and data-parallel workflows
Complexity results for throughput and latency optimization of replicated and data-parallel workflows Anne Benoit and Yves Robert GRAAL team, LIP École Normale Supérieure de Lyon June 2007 Anne.Benoit@ens-lyon.fr
More informationComplexity Results for Throughput and Latency Optimization of Replicated and Data-parallel Workflows
Complexity Results for Throughput and Latency Optimization of Replicated and Data-parallel Workflows Anne Benoit and Yves Robert GRAAL team, LIP École Normale Supérieure de Lyon September 2007 Anne.Benoit@ens-lyon.fr
More informationComplexity results for throughput and latency optimization of replicated and data-parallel workflows
Complexity results for throughput and latency optimization of replicated and data-parallel workflows Anne Benoit and Yves Robert LIP, ENS Lyon, 46 Allée d Italie, 69364 Lyon Cedex 07, France UMR 5668 -
More informationComplexity Results for Throughput and Latency Optimization of Replicated and Data-parallel Workflows
DOI 10.1007/s00453-008-9229-4 Complexity Results for Throughput and Latency Optimization of Replicated and Data-parallel Workflows Anne Benoit Yves Robert Received: 6 April 2007 / Accepted: 19 September
More informationMapping pipeline skeletons onto heterogeneous platforms
INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE Mapping pipeline skeletons onto heterogeneous platforms Anne Benoit Yves Robert N 6087 January 007 Thème NUM apport de recherche ISSN 049-6399
More informationMapping Linear Workflows with Computation/Communication Overlap
Mapping Linear Workflows with Computation/Communication Overlap 1 Kunal Agrawal 1, Anne Benoit,2 and Yves Robert 2 1 CSAIL, Massachusetts Institute of Technology, USA 2 LIP, École Normale Supérieure de
More informationMapping pipeline skeletons onto heterogeneous platforms
INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE Mapping pipeline skeletons onto heterogeneous platforms Anne Benoit Yves Robert N 6087 January 007 Thème NUM apport de recherche ISSN 049-6399
More informationMulti-criteria scheduling of pipeline workflows
Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON-UCBL n o 5668 Multi-criteria scheduling of pipeline workflows Anne Benoit, Veronika
More informationStrategies for Replica Placement in Tree Networks
Strategies for Replica Placement in Tree Networks http://graal.ens-lyon.fr/~lmarchal/scheduling/ 2 avril 2009 Introduction and motivation Replica placement in tree networks Set of clients (tree leaves):
More informationScheduling tasks sharing files on heterogeneous master-slave platforms
Scheduling tasks sharing files on heterogeneous master-slave platforms Arnaud Giersch 1, Yves Robert 2, and Frédéric Vivien 2 1: ICPS/LSIIT, UMR CNRS ULP 7005, Strasbourg, France 2: LIP, UMR CNRS ENS Lyon
More informationCentralized versus distributed schedulers for multiple bag-of-task applications
Centralized versus distributed schedulers for multiple bag-of-task applications O. Beaumont, L. Carter, J. Ferrante, A. Legrand, L. Marchal and Y. Robert Laboratoire LaBRI, CNRS Bordeaux, France Dept.
More informationParallel algorithms at ENS Lyon
Parallel algorithms at ENS Lyon Yves Robert Ecole Normale Supérieure de Lyon & Institut Universitaire de France TCPP Workshop February 2010 Yves.Robert@ens-lyon.fr February 2010 Parallel algorithms 1/
More informationCentralized versus distributed schedulers for multiple bag-of-task applications
Centralized versus distributed schedulers for multiple bag-of-task applications O. Beaumont, L. Carter, J. Ferrante, A. Legrand, L. Marchal and Y. Robert Laboratoire LaBRI, CNRS Bordeaux, France Dept.
More informationScheduling tasks sharing files on heterogeneous master-slave platforms
Scheduling tasks sharing files on heterogeneous master-slave platforms Arnaud Giersch a,, Yves Robert b, Frédéric Vivien b a ICPS/LSIIT, UMR CNRS ULP 7005 Parc d Innovation, Bd Sébastien Brant, BP 10413,
More informationScheduling Tasks Sharing Files from Distributed Repositories
from Distributed Repositories Arnaud Giersch 1, Yves Robert 2 and Frédéric Vivien 2 1 ICPS/LSIIT, University Louis Pasteur, Strasbourg, France 2 École normale supérieure de Lyon, France September 1, 2004
More informationScheduling on clusters and grids
Some basics on scheduling theory Grégory Mounié, Yves Robert et Denis Trystram ID-IMAG 6 mars 2006 Some basics on scheduling theory 1 Some basics on scheduling theory Notations and Definitions List scheduling
More informationBi-criteria Pipeline Mappings for Parallel Image Processing
Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON-UCBL n o 5668 Bi-criteria Pipeline Mappings for Parallel Image Processing Anne
More informationOptimizing Latency and Reliability of Pipeline Workflow Applications
Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON-UCBL n o 5668 Optimizing Latency and Reliability of Pipeline Workflow Applications
More informationRectangular Partitioning
Rectangular Partitioning Joe Forsmann and Rock Hymas Introduction/Abstract We will look at a problem that I (Rock) had to solve in the course of my work. Given a set of non-overlapping rectangles each
More informationLecture 9: Load Balancing & Resource Allocation
Lecture 9: Load Balancing & Resource Allocation Introduction Moler s law, Sullivan s theorem give upper bounds on the speed-up that can be achieved using multiple processors. But to get these need to efficiently
More informationAlgorithm Design and Analysis
Algorithm Design and Analysis LECTURE 29 Approximation Algorithms Load Balancing Weighted Vertex Cover Reminder: Fill out SRTEs online Don t forget to click submit Sofya Raskhodnikova 12/7/2016 Approximation
More informationCOLA: Optimizing Stream Processing Applications Via Graph Partitioning
COLA: Optimizing Stream Processing Applications Via Graph Partitioning Rohit Khandekar, Kirsten Hildrum, Sujay Parekh, Deepak Rajan, Joel Wolf, Kun-Lung Wu, Henrique Andrade, and Bugra Gedik Streaming
More informationModularity CMSC 858L
Modularity CMSC 858L Module-detection for Function Prediction Biological networks generally modular (Hartwell+, 1999) We can try to find the modules within a network. Once we find modules, we can look
More informationIntroduction to Graph Theory
Introduction to Graph Theory Tandy Warnow January 20, 2017 Graphs Tandy Warnow Graphs A graph G = (V, E) is an object that contains a vertex set V and an edge set E. We also write V (G) to denote the vertex
More informationCSE 417 Branch & Bound (pt 4) Branch & Bound
CSE 417 Branch & Bound (pt 4) Branch & Bound Reminders > HW8 due today > HW9 will be posted tomorrow start early program will be slow, so debugging will be slow... Review of previous lectures > Complexity
More informationMaximum Differential Graph Coloring
Maximum Differential Graph Coloring Sankar Veeramoni University of Arizona Joint work with Yifan Hu at AT&T Research and Stephen Kobourov at University of Arizona Motivation Map Coloring Problem Related
More informationTowards Green Cloud Computing: Demand Allocation and Pricing Policies for Cloud Service Brokerage Chenxi Qiu
Towards Green Cloud Computing: Demand Allocation and Pricing Policies for Cloud Service Brokerage Chenxi Qiu Holcombe Department of Electrical and Computer Engineering Outline 1. Introduction 2. Algorithm
More informationOutline. CS38 Introduction to Algorithms. Approximation Algorithms. Optimization Problems. Set Cover. Set cover 5/29/2014. coping with intractibility
Outline CS38 Introduction to Algorithms Lecture 18 May 29, 2014 coping with intractibility approximation algorithms set cover TSP center selection randomness in algorithms May 29, 2014 CS38 Lecture 18
More information11. APPROXIMATION ALGORITHMS
11. APPROXIMATION ALGORITHMS load balancing center selection pricing method: vertex cover LP rounding: vertex cover generalized load balancing knapsack problem Lecture slides by Kevin Wayne Copyright 2005
More informationLaboratoire de l Informatique du Parallélisme
Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON n o 8512 SPI Multiplication by an Integer Constant Vincent Lefevre January 1999
More informationConnectivity-aware Virtual Network Embedding
Connectivity-aware Virtual Network Embedding Nashid Shahriar, Reaz Ahmed, Shihabur R. Chowdhury, Md Mashrur Alam Khan, Raouf Boutaba Jeebak Mitra, Feng Zeng Outline Survivability in Virtual Network Embedding
More informationApproximation Algorithms
Chapter 8 Approximation Algorithms Algorithm Theory WS 2016/17 Fabian Kuhn Approximation Algorithms Optimization appears everywhere in computer science We have seen many examples, e.g.: scheduling jobs
More informationThe complement of PATH is in NL
340 The complement of PATH is in NL Let c be the number of nodes in graph G that are reachable from s We assume that c is provided as an input to M Given G, s, t, and c the machine M operates as follows:
More information6 Randomized rounding of semidefinite programs
6 Randomized rounding of semidefinite programs We now turn to a new tool which gives substantially improved performance guarantees for some problems We now show how nonlinear programming relaxations can
More informationCS 580: Algorithm Design and Analysis. Jeremiah Blocki Purdue University Spring 2018
CS 580: Algorithm Design and Analysis Jeremiah Blocki Purdue University Spring 2018 Chapter 11 Approximation Algorithms Slides by Kevin Wayne. Copyright @ 2005 Pearson-Addison Wesley. All rights reserved.
More informationClustering Algorithms. Margareta Ackerman
Clustering Algorithms Margareta Ackerman A sea of algorithms As we discussed last class, there are MANY clustering algorithms, and new ones are proposed all the time. They are very different from each
More informationOn Modularity Clustering. Group III (Ying Xuan, Swati Gambhir & Ravi Tiwari)
On Modularity Clustering Presented by: Presented by: Group III (Ying Xuan, Swati Gambhir & Ravi Tiwari) Modularity A quality index for clustering a graph G=(V,E) G=(VE) q( C): EC ( ) EC ( ) + ECC (, ')
More informationApproximation Algorithms
Approximation Algorithms Given an NP-hard problem, what should be done? Theory says you're unlikely to find a poly-time algorithm. Must sacrifice one of three desired features. Solve problem to optimality.
More informationIntroduction to Parallel & Distributed Computing Parallel Graph Algorithms
Introduction to Parallel & Distributed Computing Parallel Graph Algorithms Lecture 16, Spring 2014 Instructor: 罗国杰 gluo@pku.edu.cn In This Lecture Parallel formulations of some important and fundamental
More informationMaximizing Restorable Throughput in MPLS Networks
Maximizing Restorable Throughput in MPLS Networks Reuven Cohen Dept. of Computer Science, Technion Gabi Nakibly National EW Research Center 1 Motivation IP networks are required to service real-time applications
More informationGrid Scheduler. Grid Information Service. Local Resource Manager L l Resource Manager. Single CPU (Time Shared Allocation) (Space Shared Allocation)
Scheduling on the Grid 1 2 Grid Scheduling Architecture User Application Grid Scheduler Grid Information Service Local Resource Manager Local Resource Manager Local L l Resource Manager 2100 2100 2100
More informationHigh-Level Synthesis
High-Level Synthesis 1 High-Level Synthesis 1. Basic definition 2. A typical HLS process 3. Scheduling techniques 4. Allocation and binding techniques 5. Advanced issues High-Level Synthesis 2 Introduction
More informationDatabase Architectures
B0B36DBS, BD6B36DBS: Database Systems h p://www.ksi.m.cuni.cz/~svoboda/courses/172-b0b36dbs/ Lecture 11 Database Architectures Authors: Tomáš Skopal, Irena Holubová Lecturer: Mar n Svoboda, mar n.svoboda@fel.cvut.cz
More informationNotes for Lecture 24
U.C. Berkeley CS170: Intro to CS Theory Handout N24 Professor Luca Trevisan December 4, 2001 Notes for Lecture 24 1 Some NP-complete Numerical Problems 1.1 Subset Sum The Subset Sum problem is defined
More informationGeometric Steiner Trees
Geometric Steiner Trees From the book: Optimal Interconnection Trees in the Plane By Marcus Brazil and Martin Zachariasen Part 2: Global properties of Euclidean Steiner Trees and GeoSteiner Marcus Brazil
More informationLoad-Balancing Iterative Computations on Heterogeneous Clusters with Shared Communication Links
Load-Balancing Iterative Computations on Heterogeneous Clusters with Shared Communication Links Arnaud Legrand, Hélène Renard, Yves Robert, and Frédéric Vivien LIP, UMR CNRS-INRIA-UCBL 5668, École normale
More informationGraph Theory and Optimization Approximation Algorithms
Graph Theory and Optimization Approximation Algorithms Nicolas Nisse Université Côte d Azur, Inria, CNRS, I3S, France October 2018 Thank you to F. Giroire for some of the slides N. Nisse Graph Theory and
More informationChapter 5: Distributed Process Scheduling. Ju Wang, 2003 Fall Virginia Commonwealth University
Chapter 5: Distributed Process Scheduling CMSC 602 Advanced Operating Systems Static Process Scheduling Dynamic Load Sharing and Balancing Real-Time Scheduling Section 5.2, 5.3, and 5.5 Additional reading:
More informationWhat is Performance for Internet/Grid Computation?
Goals for Internet/Grid Computation? Do things you cannot otherwise do because of: Lack of Capacity Large scale computations Cost SETI Scale/Scope of communication Internet searches All of the above 9/10/2002
More informationParallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides)
Parallel Computing 2012 Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Algorithm Design Outline Computational Model Design Methodology Partitioning Communication
More informationStanford University CS261: Optimization Handout 1 Luca Trevisan January 4, 2011
Stanford University CS261: Optimization Handout 1 Luca Trevisan January 4, 2011 Lecture 1 In which we describe what this course is about and give two simple examples of approximation algorithms 1 Overview
More informationApproximation Algorithms
Approximation Algorithms Prof. Tapio Elomaa tapio.elomaa@tut.fi Course Basics A 4 credit unit course Part of Theoretical Computer Science courses at the Laboratory of Mathematics There will be 4 hours
More informationIntroduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14
600.363 Introduction to Algorithms / 600.463 Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14 23.1 Introduction We spent last week proving that for certain problems,
More informationExtending SLURM with Support for GPU Ranges
Available on-line at www.prace-ri.eu Partnership for Advanced Computing in Europe Extending SLURM with Support for GPU Ranges Seren Soner a, Can Özturana,, Itir Karac a a Computer Engineering Department,
More informationCluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1
Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods
More informationPrinciples of Parallel Algorithm Design: Concurrency and Mapping
Principles of Parallel Algorithm Design: Concurrency and Mapping John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 3 17 January 2017 Last Thursday
More informationResource Allocation Strategies for In-Network Stream Processing
Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON-UCBL n o 5668 Resource Allocation Strategies for In-Network Stream Processing
More information3 No-Wait Job Shops with Variable Processing Times
3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select
More informationDesign and Analysis of Algorithms
CSE 101, Winter 018 D/Q Greed SP s DP LP, Flow B&B, Backtrack Metaheuristics P, NP Design and Analysis of Algorithms Lecture 8: Greed Class URL: http://vlsicad.ucsd.edu/courses/cse101-w18/ Optimization
More informationAlgorithms for Euclidean TSP
This week, paper [2] by Arora. See the slides for figures. See also http://www.cs.princeton.edu/~arora/pubs/arorageo.ps Algorithms for Introduction This lecture is about the polynomial time approximation
More information11. APPROXIMATION ALGORITHMS
Coping with NP-completeness 11. APPROXIMATION ALGORITHMS load balancing center selection pricing method: weighted vertex cover LP rounding: weighted vertex cover generalized load balancing knapsack problem
More informationAlgorithm Analysis. (Algorithm Analysis ) Data Structures and Programming Spring / 48
Algorithm Analysis (Algorithm Analysis ) Data Structures and Programming Spring 2018 1 / 48 What is an Algorithm? An algorithm is a clearly specified set of instructions to be followed to solve a problem
More informationComputational Complexity CSC Professor: Tom Altman. Capacitated Problem
Computational Complexity CSC 5802 Professor: Tom Altman Capacitated Problem Agenda: Definition Example Solution Techniques Implementation Capacitated VRP (CPRV) CVRP is a Vehicle Routing Problem (VRP)
More informationOutline. 1 Introduction. 2 The problem of assigning PCUs in GERAN. 3 Other hierarchical structuring problems in cellular networks.
Optimisation of cellular network structure by graph partitioning techniques Matías Toril, Volker Wille () (2) () Communications Engineering Dept., University of Málaga, Spain (mtoril@ic.uma.es) (2) Performance
More informationEnergy-aware Scheduling for Frame-based Tasks on Heterogeneous Multiprocessor Platforms
Energy-aware Scheduling for Frame-based Tasks on Heterogeneous Multiprocessor Platforms Dawei Li and Jie Wu Temple University, Philadelphia, PA, USA 2012-9-13 ICPP 2012 1 Outline 1. Introduction 2. System
More informationL9: Hierarchical Clustering
L9: Hierarchical Clustering This marks the beginning of the clustering section. The basic idea is to take a set X of items and somehow partition X into subsets, so each subset has similar items. Obviously,
More information1. Lecture notes on bipartite matching
Massachusetts Institute of Technology 18.453: Combinatorial Optimization Michel X. Goemans February 5, 2017 1. Lecture notes on bipartite matching Matching problems are among the fundamental problems in
More informationApproximation Algorithms
Approximation Algorithms Lecture 14 01/25/11 1 - Again Problem: Steiner Tree. Given an undirected graph G=(V,E) with non-negative edge costs c : E Q + whose vertex set is partitioned into required vertices
More informationThe Size Robust Multiple Knapsack Problem
MASTER THESIS ICA-3251535 The Size Robust Multiple Knapsack Problem Branch and Price for the Separate and Combined Recovery Decomposition Model Author: D.D. Tönissen, Supervisors: dr. ir. J.M. van den
More information/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18
601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18 22.1 Introduction We spent the last two lectures proving that for certain problems, we can
More informationTheorem 2.9: nearest addition algorithm
There are severe limits on our ability to compute near-optimal tours It is NP-complete to decide whether a given undirected =(,)has a Hamiltonian cycle An approximation algorithm for the TSP can be used
More informationMaster-Worker pattern
COSC 6397 Big Data Analytics Master Worker Programming Pattern Edgar Gabriel Fall 2018 Master-Worker pattern General idea: distribute the work among a number of processes Two logically different entities:
More informationMinimum Weight Constrained Forest Problems. Problem Definition
Slide 1 s Xiaoyun Ji, John E. Mitchell Department of Mathematical Sciences Rensselaer Polytechnic Institute Troy, NY, USA jix@rpi.edu, mitchj@rpi.edu 2005 Optimization Days Montreal, Canada May 09, 2005
More informationApproximation Algorithms
Approximation Algorithms Prof. Tapio Elomaa tapio.elomaa@tut.fi Course Basics A new 4 credit unit course Part of Theoretical Computer Science courses at the Department of Mathematics There will be 4 hours
More informationSeminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm
Seminar on A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Mohammad Iftakher Uddin & Mohammad Mahfuzur Rahman Matrikel Nr: 9003357 Matrikel Nr : 9003358 Masters of
More informationMaster-Worker pattern
COSC 6397 Big Data Analytics Master Worker Programming Pattern Edgar Gabriel Spring 2017 Master-Worker pattern General idea: distribute the work among a number of processes Two logically different entities:
More informationDESIGN AND OVERHEAD ANALYSIS OF WORKFLOWS IN GRID
I J D M S C L Volume 6, o. 1, January-June 2015 DESIG AD OVERHEAD AALYSIS OF WORKFLOWS I GRID S. JAMUA 1, K. REKHA 2, AD R. KAHAVEL 3 ABSRAC Grid workflow execution is approached as a pure best effort
More informationCSE 431/531: Analysis of Algorithms. Greedy Algorithms. Lecturer: Shi Li. Department of Computer Science and Engineering University at Buffalo
CSE 431/531: Analysis of Algorithms Greedy Algorithms Lecturer: Shi Li Department of Computer Science and Engineering University at Buffalo Main Goal of Algorithm Design Design fast algorithms to solve
More informationEffective Memory Access Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management
International Journal of Computer Theory and Engineering, Vol., No., December 01 Effective Memory Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management Sultan Daud Khan, Member,
More informationApproximation Algorithms
Approximation Algorithms Frédéric Giroire FG Simplex 1/11 Motivation Goal: Find good solutions for difficult problems (NP-hard). Be able to quantify the goodness of the given solution. Presentation of
More informationLID Assignment In InfiniBand Networks
LID Assignment In InfiniBand Networks Wickus Nienaber, Xin Yuan, Member, IEEE and Zhenhai Duan, Member, IEEE Abstract To realize a path in an InfiniBand network, an address, known as Local IDentifier (LID)
More informationCilk, Matrix Multiplication, and Sorting
6.895 Theory of Parallel Systems Lecture 2 Lecturer: Charles Leiserson Cilk, Matrix Multiplication, and Sorting Lecture Summary 1. Parallel Processing With Cilk This section provides a brief introduction
More informationAdding Data Parallelism to Streaming Pipelines for Throughput Optimization
Adding Data Parallelism to Streaming Pipelines for Throughput Optimization Peng Li, Kunal Agrawal, Jeremy Buhler, Roger D. Chamberlain Department of Computer Science and Engineering Washington University
More information16/10/2008. Today s menu. PRAM Algorithms. What do you program? What do you expect from a model? Basic ideas for the machine model
Today s menu 1. What do you program? Parallel complexity and algorithms PRAM Algorithms 2. The PRAM Model Definition Metrics and notations Brent s principle A few simple algorithms & concepts» Parallel
More information12.1 Formulation of General Perfect Matching
CSC5160: Combinatorial Optimization and Approximation Algorithms Topic: Perfect Matching Polytope Date: 22/02/2008 Lecturer: Lap Chi Lau Scribe: Yuk Hei Chan, Ling Ding and Xiaobing Wu In this lecture,
More informationChisel++: Handling Partitioning Skew in MapReduce Framework Using Efficient Range Partitioning Technique
Chisel++: Handling Partitioning Skew in MapReduce Framework Using Efficient Range Partitioning Technique Prateek Dhawalia Sriram Kailasam D. Janakiram Distributed and Object Systems Lab Dept. of Comp.
More informationApproximation Basics
Milestones, Concepts, and Examples Xiaofeng Gao Department of Computer Science and Engineering Shanghai Jiao Tong University, P.R.China Spring 2015 Spring, 2015 Xiaofeng Gao 1/53 Outline History NP Optimization
More informationHardware-Software Codesign
Hardware-Software Codesign 4. System Partitioning Lothar Thiele 4-1 System Design specification system synthesis estimation SW-compilation intellectual prop. code instruction set HW-synthesis intellectual
More informationParallel Databases C H A P T E R18. Practice Exercises
C H A P T E R18 Parallel Databases Practice Exercises 181 In a range selection on a range-partitioned attribute, it is possible that only one disk may need to be accessed Describe the benefits and drawbacks
More informationGraphs and Network Flows IE411. Lecture 21. Dr. Ted Ralphs
Graphs and Network Flows IE411 Lecture 21 Dr. Ted Ralphs IE411 Lecture 21 1 Combinatorial Optimization and Network Flows In general, most combinatorial optimization and integer programming problems are
More informationIntroduction to Parallel Computing
Introduction to Parallel Computing George Karypis Sorting Outline Background Sorting Networks Quicksort Bucket-Sort & Sample-Sort Background Input Specification Each processor has n/p elements A ordering
More informationPreclass Warmup. ESE535: Electronic Design Automation. Motivation (1) Today. Bisection Width. Motivation (2)
ESE535: Electronic Design Automation Preclass Warmup What cut size were you able to achieve? Day 4: January 28, 25 Partitioning (Intro, KLFM) 2 Partitioning why important Today Can be used as tool at many
More informationSolutions for the Exam 6 January 2014
Mastermath and LNMB Course: Discrete Optimization Solutions for the Exam 6 January 2014 Utrecht University, Educatorium, 13:30 16:30 The examination lasts 3 hours. Grading will be done before January 20,
More information1. Lecture notes on bipartite matching February 4th,
1. Lecture notes on bipartite matching February 4th, 2015 6 1.1.1 Hall s Theorem Hall s theorem gives a necessary and sufficient condition for a bipartite graph to have a matching which saturates (or matches)
More informationChapter 5 Graph Algorithms Algorithm Theory WS 2012/13 Fabian Kuhn
Chapter 5 Graph Algorithms Algorithm Theory WS 2012/13 Fabian Kuhn Graphs Extremely important concept in computer science Graph, : node (or vertex) set : edge set Simple graph: no self loops, no multiple
More informationEnergy-Aware Scheduling for Acyclic Synchronous Data Flows on Multiprocessors
Journal of Interconnection Networks c World Scientific Publishing Company Energy-Aware Scheduling for Acyclic Synchronous Data Flows on Multiprocessors DAWEI LI and JIE WU Department of Computer and Information
More informationGreedy Algorithms. T. M. Murali. January 28, Interval Scheduling Interval Partitioning Minimising Lateness
Greedy Algorithms T. M. Murali January 28, 2008 Algorithm Design Start discussion of dierent ways of designing algorithms. Greedy algorithms, divide and conquer, dynamic programming. Discuss principles
More informationBi-Objective Optimization for Scheduling in Heterogeneous Computing Systems
Bi-Objective Optimization for Scheduling in Heterogeneous Computing Systems Tony Maciejewski, Kyle Tarplee, Ryan Friese, and Howard Jay Siegel Department of Electrical and Computer Engineering Colorado
More informationCSC 373: Algorithm Design and Analysis Lecture 3
CSC 373: Algorithm Design and Analysis Lecture 3 Allan Borodin January 11, 2013 1 / 13 Lecture 3: Outline Write bigger and get better markers A little more on charging arguments Continue examples of greedy
More information1 The Traveling Salesperson Problem (TSP)
CS 598CSC: Approximation Algorithms Lecture date: January 23, 2009 Instructor: Chandra Chekuri Scribe: Sungjin Im In the previous lecture, we had a quick overview of several basic aspects of approximation
More information