Complexity results for throughput and latency optimization of replicated and data-parallel workflows
|
|
- Bertram Arnold
- 5 years ago
- Views:
Transcription
1 Complexity results for throughput and latency optimization of replicated and data-parallel workflows Anne Benoit and Yves Robert GRAAL team, LIP École Normale Supérieure de Lyon June 2007 June 2007 Workflows complexity results Alpage meeting 1/ 33
2 Introduction and motivation Mapping workflow applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping Mapping pipeline and fork workflows June 2007 Workflows complexity results Alpage meeting 2/ 33
3 Introduction and motivation Mapping workflow applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping Mapping pipeline and fork workflows June 2007 Workflows complexity results Alpage meeting 2/ 33
4 Introduction and motivation Mapping workflow applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping Mapping pipeline and fork workflows June 2007 Workflows complexity results Alpage meeting 2/ 33
5 Introduction and motivation Mapping workflow applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping Mapping pipeline and fork workflows June 2007 Workflows complexity results Alpage meeting 2/ 33
6 Introduction and motivation Mapping workflow applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping Mapping pipeline and fork workflows June 2007 Workflows complexity results Alpage meeting 2/ 33
7 Introduction and motivation Mapping workflow applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping Mapping pipeline and fork workflows June 2007 Workflows complexity results Alpage meeting 2/ 33
8 Rule of the game Consecutive data-sets fed into the workflow Period T period = time interval between beginning of execution of two consecutive data sets (throughput=1/t period ) Latency T latency (x) = time elapsed between beginning and end of execution for a given data set x, and T latency = max x T latency (x) Map each pipeline/fork stage on one or several processors Goal: minimize T period or T latency or bi-criteria minimization Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 3/ 33
9 Rule of the game Consecutive data-sets fed into the workflow Period T period = time interval between beginning of execution of two consecutive data sets (throughput=1/t period ) Latency T latency (x) = time elapsed between beginning and end of execution for a given data set x, and T latency = max x T latency (x) Map each pipeline/fork stage on one or several processors Goal: minimize T period or T latency or bi-criteria minimization Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 3/ 33
10 Replication and data-parallelism Replicate stage S k on P 1,..., P q... S k 1 S k on P 2 : data sets 2, 5, 8,... S k on P 1 : data sets 1, 4, 7,... S k on P 3 : data sets 3, 5, 9,... S k+1... Data-parallelize stage S k on P 1,..., P q S k (w = 16) P 1 (s 1 = 2) : P 2 (s 2 = 1) : P 3 (s 3 = 1) : Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 4/ 33
11 Replication and data-parallelism Replicate stage S k on P 1,..., P q... S k 1 S k on P 2 : data sets 2, 5, 8,... S k on P 1 : data sets 1, 4, 7,... S k on P 3 : data sets 3, 5, 9,... S k+1... Data-parallelize stage S k on P 1,..., P q S k (w = 16) P 1 (s 1 = 2) : P 2 (s 2 = 1) : P 3 (s 3 = 1) : Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 4/ 33
12 Major contributions Complexity results for throughput and latency optimization of replicated and data-parallel workflows Theoretical approach to the problem definition of replication and data-parallelism formal definition of T period and T latency in each case Problem complexity: focus on pipeline and fork workflows Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 5/ 33
13 Major contributions Complexity results for throughput and latency optimization of replicated and data-parallel workflows Theoretical approach to the problem definition of replication and data-parallelism formal definition of T period and T latency in each case Problem complexity: focus on pipeline and fork workflows Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 5/ 33
14 Outline 1 Framework 2 Working out an example 3 The problem 4 Complexity results 5 Conclusion Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 6/ 33
15 Outline 1 Framework 2 Working out an example 3 The problem 4 Complexity results 5 Conclusion Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 7/ 33
16 Pipeline graphs δ 0 δ 1 δ k 1 δ k δ n S 1 S 2 S k S n w 1 w 2 w k w n n stages S k, 1 k n S k : receives input of size δ k 1 from S k 1 performs w k computations outputs data of size δ k to S k+1 Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 8/ 33
17 Fork graphs δ 1 w 0 S 0 δ 0 δ 0 δ0 δ 0 w 1 w 2 S 2 w k S k w n S n S δ 1 δ 2 δ k δ n n + 1 stages S k, 0 k n S 0 : root stage S 1 to S n : independent stages A data set goes through stage S 0, then it can be executed simultaneously for all other stages Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 9/ 33
18 The platform s in P in P out s out b in,u b v,out P u s u b u,v P v s v p processors P u, 1 u p, fully interconnected s u : speed of processor P u bidirectional link link u,v : P u P v, bandwidth b u,v one-port model: each processor can either send, receive or compute at any time-step Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 10/ 33
19 Different platforms Fully Homogeneous Identical processors (s u = s) and links (b u,v = b): typical parallel machines Communication Homogeneous Different-speed processors (s u s v ), identical links (b u,v = b): networks of workstations, clusters Fully Heterogeneous Fully heterogeneous architectures, s u s v and b u,v b u,v : hierarchical platforms, grids Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 11/ 33
20 Back to pipeline: mapping strategies S 1 S 2... S k... S n The pipeline application Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 12/ 33
21 Back to pipeline: mapping strategies S 1 S 2... S k... S n One-to-one Mapping Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 12/ 33
22 Back to pipeline: mapping strategies S 1 S 2... S k... S n Interval Mapping Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 12/ 33
23 Back to pipeline: mapping strategies S 1 S 2... S k... S n General Mapping Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 12/ 33
24 Back to pipeline: mapping strategies S 1 S 2... S k... S n Interval Mapping In this work, Interval Mapping Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 12/ 33
25 Chains-on-chains Load-balance contiguous tasks Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 13/ 33
26 Chains-on-chains Load-balance contiguous tasks With p = 4 identical processors? Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 13/ 33
27 Chains-on-chains Load-balance contiguous tasks With p = 4 identical processors? T period = 20 Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 13/ 33
28 Chains-on-chains Load-balance contiguous tasks With p = 4 identical processors? T period = 20 NP-hard for different-speed processors, even without communications Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 13/ 33
29 Outline 1 Framework 2 Working out an example 3 The problem 4 Complexity results 5 Conclusion Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 14/ 33
30 Working out an example S 1 S 2 S 3 S Interval mapping, 4 processors, s 1 = 2 and s 2 = s 3 = s 4 = 1 Optimal period? T period = 7, S 1 P 1, S 2 S 3 P 2, S 4 P 3 (T latency = 17) Optimal latency? T latency = 12, S 1 S 2 S 3 S 4 P 1 (T period = 12) Min. latency if T period 10? T latency = 14, S 1 S 2 S 3 P 1, S 4 P 2 Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 15/ 33
31 Working out an example S 1 S 2 S 3 S Interval mapping, 4 processors, s 1 = 2 and s 2 = s 3 = s 4 = 1 Optimal period? T period = 7, S 1 P 1, S 2 S 3 P 2, S 4 P 3 (T latency = 17) Optimal latency? T latency = 12, S 1 S 2 S 3 S 4 P 1 (T period = 12) Min. latency if T period 10? T latency = 14, S 1 S 2 S 3 P 1, S 4 P 2 Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 15/ 33
32 Working out an example S 1 S 2 S 3 S Interval mapping, 4 processors, s 1 = 2 and s 2 = s 3 = s 4 = 1 Optimal period? T period = 7, S 1 P 1, S 2 S 3 P 2, S 4 P 3 (T latency = 17) Optimal latency? T latency = 12, S 1 S 2 S 3 S 4 P 1 (T period = 12) Min. latency if T period 10? T latency = 14, S 1 S 2 S 3 P 1, S 4 P 2 Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 15/ 33
33 Working out an example S 1 S 2 S 3 S Interval mapping, 4 processors, s 1 = 2 and s 2 = s 3 = s 4 = 1 Optimal period? T period = 7, S 1 P 1, S 2 S 3 P 2, S 4 P 3 (T latency = 17) Optimal latency? T latency = 12, S 1 S 2 S 3 S 4 P 1 (T period = 12) Min. latency if T period 10? T latency = 14, S 1 S 2 S 3 P 1, S 4 P 2 Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 15/ 33
34 Example with replication and data-parallelism S 1 S 2 S 3 S Interval mapping, 4 processors, s 1 = 2 and s 2 = s 3 = s 4 = 1 Replicate interval [S u..s v ] on P 1,..., P q... S S u... S v on P 1 : data sets 1, 4, 7,... S u... S v on P 2 : data sets 2, 5, 8,... S u... S v on P 3 : data sets 3, 5, 9,... S... T period = P v k=u w k q min i (s i ) and T latency = q T period Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 16/ 33
35 Example with replication and data-parallelism S 1 S 2 S 3 S Interval mapping, 4 processors, s 1 = 2 and s 2 = s 3 = s 4 = 1 Data Parallelize single stage S k on P 1,..., P q S (w = 16) T period = w k P q i=1 s i P 1 (s 1 = 2) : P 2 (s 2 = 1) : P 3 (s 3 = 1) : and T latency = T period Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 16/ 33
36 Example with replication and data-parallelism S 1 S 2 S 3 S Interval mapping, 4 processors, s 1 = 2 and s 2 = s 3 = s 4 = 1 Optimal period? Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 16/ 33
37 Example with replication and data-parallelism S 1 S 2 S 3 S Interval mapping, 4 processors, s 1 = 2 and s 2 = s 3 = s 4 = 1 Optimal period? DP S 1 P REP 1P 2, S 2 S 3 S 4 P 3P 4 T period = max( , ) = 5, T latency = Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 16/ 33
38 Example with replication and data-parallelism S 1 S 2 S 3 S Interval mapping, 4 processors, s 1 = 2 and s 2 = s 3 = s 4 = 1 Optimal period? DP S 1 P REP 1P 2, S 2 S 3 S 4 P 3P 4 T period = max( , ) = 5, T latency = DP S 1 P 2P 3 P 4, S 2 S 3 S 4 P 1 14 T period = max( 1+1+1, ) = 5, T latency = 9.67 (optimal) Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 16/ 33
39 Outline 1 Framework 2 Working out an example 3 The problem 4 Complexity results 5 Conclusion Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 17/ 33
40 Interval Mapping for pipeline graphs Several consecutive stages onto the same processor Increase computational load, reduce communications Partition of [1..n] into m intervals I j = [d j, e j ] (with d j e j for 1 j m, d 1 = 1, d j+1 = e j + 1 for 1 j m 1 and e m = n) Interval I j mapped onto processor P alloc(j) T period = max 1 j m T latency = 1 j m { { δ dj 1 b alloc(j 1),alloc(j) + δ dj 1 b alloc(j 1),alloc(j) + ej i=d j w i s alloc(j) + ej i=d j w i s alloc(j) + δ ej b alloc(j),alloc(j+1) δ ej b alloc(j),alloc(j+1) } } Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 18/ 33
41 Interval Mapping for pipeline graphs Several consecutive stages onto the same processor Increase computational load, reduce communications Partition of [1..n] into m intervals I j = [d j, e j ] (with d j e j for 1 j m, d 1 = 1, d j+1 = e j + 1 for 1 j m 1 and e m = n) Interval I j mapped onto processor P alloc(j) T period = max 1 j m T latency = 1 j m { { δ dj 1 b alloc(j 1),alloc(j) + δ dj 1 b alloc(j 1),alloc(j) + ej i=d j w i s alloc(j) + ej i=d j w i s alloc(j) + δ ej b alloc(j),alloc(j+1) δ ej b alloc(j),alloc(j+1) } } Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 18/ 33
42 Interval Mapping for pipeline graphs Several consecutive stages onto the same processor Increase computational load, reduce communications Partition of [1..n] into m intervals I j = [d j, e j ] (with d j e j for 1 j m, d 1 = 1, d j+1 = e j + 1 for 1 j m 1 and e m = n) Interval I j mapped onto processor P alloc(j) T period = max 1 j m T latency = 1 j m { { δ dj 1 b alloc(j 1),alloc(j) + δ dj 1 b alloc(j 1),alloc(j) + ej i=d j w i s alloc(j) + ej i=d j w i s alloc(j) + δ ej b alloc(j),alloc(j+1) δ ej b alloc(j),alloc(j+1) } } Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 18/ 33
43 Interval Mapping for pipeline graphs Several consecutive stages onto the same processor Increase computational load, reduce communications Partition of [1..n] into m intervals I j = [d j, e j ] (with d j e j for 1 j m, d 1 = 1, d j+1 = e j + 1 for 1 j m 1 and e m = n) Interval I j mapped onto processor P alloc(j) T period = max 1 j m T latency = 1 j m { { δ dj 1 b alloc(j 1),alloc(j) + δ dj 1 b alloc(j 1),alloc(j) + ej i=d j w i s alloc(j) + ej i=d j w i s alloc(j) + δ ej b alloc(j),alloc(j+1) δ ej b alloc(j),alloc(j+1) } } Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 18/ 33
44 Fork graphs map any partition of the graph onto the processors q intervals, q p first interval: S 0 and possibly S 1 to S k next intervals of independent stages Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 19/ 33
45 Fork graphs map any partition of the graph onto the processors q intervals, q p first interval: S 0 and possibly S 1 to S k next intervals of independent stages T period =? Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 19/ 33
46 Fork graphs map any partition of the graph onto the processors q intervals, q p first interval: S 0 and possibly S 1 to S k next intervals of independent stages T period =? depends on the com model: is it possible to start com as soon as S 0 is done? Which order for com? Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 19/ 33
47 Fork graphs map any partition of the graph onto the processors q intervals, q p first interval: S 0 and possibly S 1 to S k next intervals of independent stages Informally: T period = max time needed by processor to receive data, compute, output result Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 19/ 33
48 Fork graphs map any partition of the graph onto the processors q intervals, q p first interval: S 0 and possibly S 1 to S k next intervals of independent stages Informally: T period = max time needed by processor to receive data, compute, output result T latency = time elapsed between data set input to S 0 until last computation for this data set is completed Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 19/ 33
49 Fork graphs map any partition of the graph onto the processors q intervals, q p first interval: S 0 and possibly S 1 to S k next intervals of independent stages Informally: T period = max time needed by processor to receive data, compute, output result T latency = time elapsed between data set input to S 0 until last computation for this data set is completed Simpler model for formal analysis Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 19/ 33
50 Back to a simpler problem No communication costs nor overheads Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 20/ 33
51 Back to a simpler problem No communication costs nor overheads Cost to execute S i on P u alone: w i s u Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 20/ 33
52 Back to a simpler problem No communication costs nor overheads Cost to execute S i on P u alone: Cost to data-parallelize [S i, S j ] (i = j for pipeline; 0 < i j or i = j = 0 for fork) on k processors P q1,..., P qk : w i s u j l=i w l k u=1 s q u. Cost = T period of assigned processors Cost = delay to traverse the interval Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 20/ 33
53 Back to a simpler problem No communication costs nor overheads Cost to execute S i on P u alone: Cost to data-parallelize Cost to replicate [S i, S j ] on k processors P q1,..., P qk : w i s u j l=i w l k min 1 u k s qu. Cost = T period of assigned processors Delay to traverse the interval = time needed by slowest processor: j l=i t max = w l min 1 u k s qu Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 20/ 33
54 Back to a simpler problem No communication costs nor overheads Cost to execute S i on P u alone: Cost to data-parallelize Cost to replicate With these formulas: easy to compute T period for both graphs, and T latency for pipeline graphs w i s u Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 20/ 33
55 Latency for a fork? partition of stages into q sets I r (1 r q p) S 0 I 1, to k processors P q1,..., P qk t max (r) = delay of r-th set (1 r q), computed as before flexible com model: computations of I r, r 2, start as soon as computation of S 0 is completed. s 0 = speed at which S 0 is processed: s 0 = k u=1 s q u if I 1 is data-parallelized s 0 = min 1 u k s qu if I 1 is replicated ( T latency = max t max (1), w ) 0 + max s t max(r) 0 2 r q Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 21/ 33
56 Latency for a fork? partition of stages into q sets I r (1 r q p) S 0 I 1, to k processors P q1,..., P qk t max (r) = delay of r-th set (1 r q), computed as before flexible com model: computations of I r, r 2, start as soon as computation of S 0 is completed. s 0 = speed at which S 0 is processed: s 0 = k u=1 s q u if I 1 is data-parallelized s 0 = min 1 u k s qu if I 1 is replicated ( T latency = max t max (1), w ) 0 + max s t max(r) 0 2 r q Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 21/ 33
57 Latency for a fork? partition of stages into q sets I r (1 r q p) S 0 I 1, to k processors P q1,..., P qk t max (r) = delay of r-th set (1 r q), computed as before flexible com model: computations of I r, r 2, start as soon as computation of S 0 is completed. s 0 = speed at which S 0 is processed: s 0 = k u=1 s q u if I 1 is data-parallelized s 0 = min 1 u k s qu if I 1 is replicated ( T latency = max t max (1), w ) 0 + max s t max(r) 0 2 r q Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 21/ 33
58 Latency for a fork? partition of stages into q sets I r (1 r q p) S 0 I 1, to k processors P q1,..., P qk t max (r) = delay of r-th set (1 r q), computed as before flexible com model: computations of I r, r 2, start as soon as computation of S 0 is completed. s 0 = speed at which S 0 is processed: s 0 = k u=1 s q u if I 1 is data-parallelized s 0 = min 1 u k s qu if I 1 is replicated ( T latency = max t max (1), w ) 0 + max s t max(r) 0 2 r q Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 21/ 33
59 Optimization problem Given an application graph (n-stage pipeline or (n + 1)-stage fork), a target platform (Homogeneous with p identical processors or Heterogeneous with p different-speed processors), a mapping strategy with replication, and either with data-parallelization or without, an objective (period T period or latency T latency ), determine an interval-based mapping that minimizes the objective 16 optimization problems Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 22/ 33
60 Optimization problem Given an application graph (n-stage pipeline or (n + 1)-stage fork), a target platform (Homogeneous with p identical processors or Heterogeneous with p different-speed processors), a mapping strategy with replication, and either with data-parallelization or without, an objective (period T period or latency T latency ), determine an interval-based mapping that minimizes the objective 16 optimization problems Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 22/ 33
61 Optimization problem Given an application graph (n-stage pipeline or (n + 1)-stage fork), a target platform (Homogeneous with p identical processors or Heterogeneous with p different-speed processors), a mapping strategy with replication, and either with data-parallelization or without, an objective (period T period or latency T latency ), determine an interval-based mapping that minimizes the objective 16 optimization problems Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 22/ 33
62 Bi-criteria optimization problem given threshold period P threshold, determine mapping whose period does not exceed P threshold and that minimizes T latency given threshold latency L threshold, determine mapping whose latency does not exceed L threshold and that minimizes T period Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 23/ 33
63 Outline 1 Framework 2 Working out an example 3 The problem 4 Complexity results 5 Conclusion Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 24/ 33
64 Complexity results Without data-parallelism, Homogeneous platforms Objective period latency bi-criteria Hom. pipeline - Het. pipeline Poly (str) Hom. fork - Poly (DP) Het. fork Poly (str) NP-hard Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 25/ 33
65 Complexity results With data-parallelism, Homogeneous platforms Objective period latency bi-criteria Hom. pipeline - Het. pipeline Poly (DP) Hom. fork - Poly (DP) Het. fork Poly (str) NP-hard Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 25/ 33
66 Complexity results Without data-parallelism, Heterogeneous platforms Objective period latency bi-criteria Hom. pipeline Poly (*) - Poly (*) Het. pipeline NP-hard (**) Poly (str) NP-hard Hom. fork Poly (*) Het. fork NP-hard - Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 25/ 33
67 Complexity results With data-parallelism, Heterogeneous platforms Objective period latency bi-criteria Hom. pipeline NP-hard Het. pipeline - Hom. fork NP-hard Het. fork - Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 25/ 33
68 Complexity results Most interesting case: Without data-parallelism, Heterogeneous platforms Objective period latency bi-criteria Hom. pipeline Poly (*) - Poly (*) Het. pipeline NP-hard (**) Poly (str) NP-hard Hom. fork Poly (*) Het. fork NP-hard - Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 25/ 33
69 No data-parallelism, Heterogeneous platforms For pipeline, minimizing the latency is straightforward: map all stages on fastest proc Minimizing the period is NP-hard (involved reduction similar to the heterogeneous chain-to-chain one) for general pipeline Homogeneous pipeline: all stages have same workload w: in this case, polynomial complexity. Polynomial bi-criteria algorithm for homogeneous pipeline June 2007 Workflows complexity results Alpage meeting 26/ 33
70 No data-parallelism, Heterogeneous platforms For pipeline, minimizing the latency is straightforward: map all stages on fastest proc Minimizing the period is NP-hard (involved reduction similar to the heterogeneous chain-to-chain one) for general pipeline Homogeneous pipeline: all stages have same workload w: in this case, polynomial complexity. Polynomial bi-criteria algorithm for homogeneous pipeline June 2007 Workflows complexity results Alpage meeting 26/ 33
71 Lemma: form of the solution Pipeline, no data-parallelism, Heterogeneous platform Lemma If an optimal solution which minimizes pipeline period uses q processors, consider q fastest processors P 1,..., P q, ordered by non-decreasing speeds: s 1... s q. There exists an optimal solution which replicates intervals of stages onto k intervals of processors I r = [P dr, P er ], with 1 r k q, d 1 = 1, e k = q, and e r + 1 = d r+1 for 1 r < k. Proof: exchange argument, which does not increase latency Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 27/ 33
72 Lemma: form of the solution Pipeline, no data-parallelism, Heterogeneous platform Lemma If an optimal solution which minimizes pipeline period uses q processors, consider q fastest processors P 1,..., P q, ordered by non-decreasing speeds: s 1... s q. There exists an optimal solution which replicates intervals of stages onto k intervals of processors I r = [P dr, P er ], with 1 r k q, d 1 = 1, e k = q, and e r + 1 = d r+1 for 1 r < k. Proof: exchange argument, which does not increase latency Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 27/ 33
73 Binary-search/Dynamic programming algorithm Given latency L, given period K Loop on number of processors q Dynamic programming algorithm to minimize latency Success if L is obtained Binary search on L to minimize latency for fixed period Binary search on K to minimize period for fixed latency Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 28/ 33
74 Binary-search/Dynamic programming algorithm Given latency L, given period K Loop on number of processors q Dynamic programming algorithm to minimize latency Success if L is obtained Binary search on L to minimize latency for fixed period Binary search on K to minimize period for fixed latency Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 28/ 33
75 Dynamic programming algorithm Compute L(n, 1, q), where L(m, i, j) = minimum latency to map m pipeline stages on processors P i to P j, while fitting in period K. L(m, i, j) = min 1 m < m i k < j { m.w s i if m.w (j i).s i K (1) L(m, i, k) + L(m m, k + 1, j) (2) Case (1): replicating m stages onto processors P i,..., P j Case (2): splitting the interval Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 29/ 33
76 Dynamic programming algorithm Compute L(n, 1, q), where L(m, i, j) = minimum latency to map m pipeline stages on processors P i to P j, while fitting in period K. L(m, i, j) = Initialization: min 1 m < m i k < j L(1, i, j) = L(m, i, i) = { m.w s i if m.w (j i).s i K (1) L(m, i, k) + L(m m, k + 1, j) (2) { w w s i if (j i).s i K + otherwise { m.w s i m.w if s i + otherwise K Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 29/ 33
77 Dynamic programming algorithm Compute L(n, 1, q), where L(m, i, j) = minimum latency to map m pipeline stages on processors P i to P j, while fitting in period K. L(m, i, j) = min 1 m < m i k < j { m.w s i if m.w (j i).s i K (1) L(m, i, k) + L(m m, k + 1, j) (2) Complexity of the dynamic programming: O(n 2.p 4 ) Number of iterations of the binary search formally bounded, very small number of iterations in practice. Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 29/ 33
78 Outline 1 Framework 2 Working out an example 3 The problem 4 Complexity results 5 Conclusion Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 30/ 33
79 Related work Subhlok and Vondran Extension of their work (pipeline on hom platforms) Chains-to-chains In our work possibility to replicate or data-parallelize Mapping pipelined computations onto clusters and grids DAG [Taura et al.], DataCutter [Saltz et al.] Energy-aware mapping of pipelined computations [Melhem et al.], three-criteria optimization Mapping pipelined computations onto special-purpose architectures FPGA arrays [Fabiani et al.]. Fault-tolerance for embedded systems [Zhu et al.] Mapping skeletons onto clusters and grids Use of stochastic process algebra [Benoit et al.] June 2007 Workflows complexity results Alpage meeting 31/ 33
80 Conclusion Mapping structured workflow applications onto computational platforms, with replication and data-parallelism Complexity of the most tractable instances insight of the combinatorial nature of the problem Pipeline and fork graphs, extension to fork-join Homogeneous and Heterogeneous platforms with no communications Minimizing period or latency, and bi-criteria optimization problems Solid theoretical foundation for study of single/bi-criteria mappings, with possibility to replicate and data-parallelize application stages June 2007 Workflows complexity results Alpage meeting 32/ 33
81 Conclusion Mapping structured workflow applications onto computational platforms, with replication and data-parallelism Complexity of the most tractable instances insight of the combinatorial nature of the problem Pipeline and fork graphs, extension to fork-join Homogeneous and Heterogeneous platforms with no communications Minimizing period or latency, and bi-criteria optimization problems Solid theoretical foundation for study of single/bi-criteria mappings, with possibility to replicate and data-parallelize application stages June 2007 Workflows complexity results Alpage meeting 32/ 33
82 Future work Short term Longer term Select polynomial instances of the problem and assess complexity when adding communication Design heuristics to solve combinatorial instances of the problem Heuristics based on our polynomial algorithms for general application graphs structured as combinations of pipeline and fork kernels Real experiments on heterogeneous clusters Comparison of effective performance against theoretical performance June 2007 Workflows complexity results Alpage meeting 33/ 33
Complexity Results for Throughput and Latency Optimization of Replicated and Data-parallel Workflows
Complexity Results for Throughput and Latency Optimization of Replicated and Data-parallel Workflows Anne Benoit and Yves Robert GRAAL team, LIP École Normale Supérieure de Lyon September 2007 Anne.Benoit@ens-lyon.fr
More informationMapping pipeline skeletons onto heterogeneous platforms
Mapping pipeline skeletons onto heterogeneous platforms Anne Benoit and Yves Robert GRAAL team, LIP École Normale Supérieure de Lyon January 2007 Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons
More informationComplexity results for throughput and latency optimization of replicated and data-parallel workflows
Complexity results for throughput and latency optimization of replicated and data-parallel workflows Anne Benoit and Yves Robert LIP, ENS Lyon, 46 Allée d Italie, 69364 Lyon Cedex 07, France UMR 5668 -
More informationComplexity Results for Throughput and Latency Optimization of Replicated and Data-parallel Workflows
DOI 10.1007/s00453-008-9229-4 Complexity Results for Throughput and Latency Optimization of Replicated and Data-parallel Workflows Anne Benoit Yves Robert Received: 6 April 2007 / Accepted: 19 September
More informationMapping Linear Workflows with Computation/Communication Overlap
Mapping Linear Workflows with Computation/Communication Overlap 1 Kunal Agrawal 1, Anne Benoit,2 and Yves Robert 2 1 CSAIL, Massachusetts Institute of Technology, USA 2 LIP, École Normale Supérieure de
More informationOptimizing Latency and Reliability of Pipeline Workflow Applications
Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON-UCBL n o 5668 Optimizing Latency and Reliability of Pipeline Workflow Applications
More informationMulti-criteria scheduling of pipeline workflows
Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON-UCBL n o 5668 Multi-criteria scheduling of pipeline workflows Anne Benoit, Veronika
More informationMapping pipeline skeletons onto heterogeneous platforms
INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE Mapping pipeline skeletons onto heterogeneous platforms Anne Benoit Yves Robert N 6087 January 007 Thème NUM apport de recherche ISSN 049-6399
More informationScheduling Tasks Sharing Files from Distributed Repositories
from Distributed Repositories Arnaud Giersch 1, Yves Robert 2 and Frédéric Vivien 2 1 ICPS/LSIIT, University Louis Pasteur, Strasbourg, France 2 École normale supérieure de Lyon, France September 1, 2004
More informationScheduling on clusters and grids
Some basics on scheduling theory Grégory Mounié, Yves Robert et Denis Trystram ID-IMAG 6 mars 2006 Some basics on scheduling theory 1 Some basics on scheduling theory Notations and Definitions List scheduling
More informationDatabase Architectures
B0B36DBS, BD6B36DBS: Database Systems h p://www.ksi.m.cuni.cz/~svoboda/courses/172-b0b36dbs/ Lecture 11 Database Architectures Authors: Tomáš Skopal, Irena Holubová Lecturer: Mar n Svoboda, mar n.svoboda@fel.cvut.cz
More informationScheduling tasks sharing files on heterogeneous master-slave platforms
Scheduling tasks sharing files on heterogeneous master-slave platforms Arnaud Giersch 1, Yves Robert 2, and Frédéric Vivien 2 1: ICPS/LSIIT, UMR CNRS ULP 7005, Strasbourg, France 2: LIP, UMR CNRS ENS Lyon
More informationMapping a Dynamic Prefix Tree on a P2P Network
Mapping a Dynamic Prefix Tree on a P2P Network Eddy Caron, Frédéric Desprez, Cédric Tedeschi GRAAL WG - October 26, 2006 Outline 1 Introduction 2 Related Work 3 DLPT architecture 4 Mapping 5 Conclusion
More informationReliability and performance optimization of pipelined real-time systems. Research Report LIP December 2009
Reliability and performance optimization of pipelined real-time systems Anne Benoit, Fanny Dufossé, Alain Girault, and Yves Robert ENS Lyon and INRIA Grenoble Rhône-Alpes, France {Firstname.Lastname}@inria.fr
More informationDESIGN AND OVERHEAD ANALYSIS OF WORKFLOWS IN GRID
I J D M S C L Volume 6, o. 1, January-June 2015 DESIG AD OVERHEAD AALYSIS OF WORKFLOWS I GRID S. JAMUA 1, K. REKHA 2, AD R. KAHAVEL 3 ABSRAC Grid workflow execution is approached as a pure best effort
More informationMapping pipeline skeletons onto heterogeneous platforms
INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE Mapping pipeline skeletons onto heterogeneous platforms Anne Benoit Yves Robert N 6087 January 007 Thème NUM apport de recherche ISSN 049-6399
More informationBi-criteria Pipeline Mappings for Parallel Image Processing
Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON-UCBL n o 5668 Bi-criteria Pipeline Mappings for Parallel Image Processing Anne
More informationOVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI
CMPE 655- MULTIPLE PROCESSOR SYSTEMS OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI What is MULTI PROCESSING?? Multiprocessing is the coordinated processing
More informationOptimizing Latency and Reliability of Pipeline Workflow Applications
INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE arxiv:0711.1231v3 [cs.dc] 26 Mar 2008 Optimizing Latency and Reliability of Pipeline Workflow Applications Anne Benoit Veronika Rehn-Sonigo
More informationParallel algorithms at ENS Lyon
Parallel algorithms at ENS Lyon Yves Robert Ecole Normale Supérieure de Lyon & Institut Universitaire de France TCPP Workshop February 2010 Yves.Robert@ens-lyon.fr February 2010 Parallel algorithms 1/
More informationChapter 20: Database System Architectures
Chapter 20: Database System Architectures Chapter 20: Database System Architectures Centralized and Client-Server Systems Server System Architectures Parallel Systems Distributed Systems Network Types
More informationStrategies for Replica Placement in Tree Networks
Strategies for Replica Placement in Tree Networks http://graal.ens-lyon.fr/~lmarchal/scheduling/ 2 avril 2009 Introduction and motivation Replica placement in tree networks Set of clients (tree leaves):
More informationCentralized versus distributed schedulers for multiple bag-of-task applications
Centralized versus distributed schedulers for multiple bag-of-task applications O. Beaumont, L. Carter, J. Ferrante, A. Legrand, L. Marchal and Y. Robert Laboratoire LaBRI, CNRS Bordeaux, France Dept.
More informationLecture 23 Database System Architectures
CMSC 461, Database Management Systems Spring 2018 Lecture 23 Database System Architectures These slides are based on Database System Concepts 6 th edition book (whereas some quotes and figures are used
More informationScheduling tasks sharing files on heterogeneous master-slave platforms
Scheduling tasks sharing files on heterogeneous master-slave platforms Arnaud Giersch a,, Yves Robert b, Frédéric Vivien b a ICPS/LSIIT, UMR CNRS ULP 7005 Parc d Innovation, Bd Sébastien Brant, BP 10413,
More informationChapter 18: Database System Architectures.! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems!
Chapter 18: Database System Architectures! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types 18.1 Centralized Systems! Run on a single computer system and
More informationSHARCNET Workshop on Parallel Computing. Hugh Merz Laurentian University May 2008
SHARCNET Workshop on Parallel Computing Hugh Merz Laurentian University May 2008 What is Parallel Computing? A computational method that utilizes multiple processing elements to solve a problem in tandem
More informationCentralized versus distributed schedulers for multiple bag-of-task applications
Centralized versus distributed schedulers for multiple bag-of-task applications O. Beaumont, L. Carter, J. Ferrante, A. Legrand, L. Marchal and Y. Robert Laboratoire LaBRI, CNRS Bordeaux, France Dept.
More informationIntroduction to Parallel Computing
Introduction to Parallel Computing George Karypis Sorting Outline Background Sorting Networks Quicksort Bucket-Sort & Sample-Sort Background Input Specification Each processor has n/p elements A ordering
More informationCounting the Number of Eulerian Orientations
Counting the Number of Eulerian Orientations Zhenghui Wang March 16, 011 1 Introduction Consider an undirected Eulerian graph, a graph in which each vertex has even degree. An Eulerian orientation of the
More informationAdding Data Parallelism to Streaming Pipelines for Throughput Optimization
Adding Data Parallelism to Streaming Pipelines for Throughput Optimization Peng Li, Kunal Agrawal, Jeremy Buhler, Roger D. Chamberlain Department of Computer Science and Engineering Washington University
More informationDistributed Computing: PVM, MPI, and MOSIX. Multiple Processor Systems. Dr. Shaaban. Judd E.N. Jenne
Distributed Computing: PVM, MPI, and MOSIX Multiple Processor Systems Dr. Shaaban Judd E.N. Jenne May 21, 1999 Abstract: Distributed computing is emerging as the preferred means of supporting parallel
More informationsflow: Towards Resource-Efficient and Agile Service Federation in Service Overlay Networks
sflow: Towards Resource-Efficient and Agile Service Federation in Service Overlay Networks Mea Wang, Baochun Li, Zongpeng Li Department of Electrical and Computer Engineering University of Toronto {mea,
More informationCHAPTER 7 CONCLUSION AND FUTURE SCOPE
121 CHAPTER 7 CONCLUSION AND FUTURE SCOPE This research has addressed the issues of grid scheduling, load balancing and fault tolerance for large scale computational grids. To investigate the solution
More informationDistributed Data Infrastructures, Fall 2017, Chapter 2. Jussi Kangasharju
Distributed Data Infrastructures, Fall 2017, Chapter 2 Jussi Kangasharju Chapter Outline Warehouse-scale computing overview Workloads and software infrastructure Failures and repairs Note: Term Warehouse-scale
More informationSymmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment
Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment Xin-Wei Shih, Tzu-Hsuan Hsu, Hsu-Chieh Lee, Yao-Wen Chang, Kai-Yuan Chao 2013.01.24 1 Outline 2 Clock Network Synthesis Clock network
More informationEnergy-aware Scheduling for Frame-based Tasks on Heterogeneous Multiprocessor Platforms
Energy-aware Scheduling for Frame-based Tasks on Heterogeneous Multiprocessor Platforms Dawei Li and Jie Wu Temple University, Philadelphia, PA, USA 2012-9-13 ICPP 2012 1 Outline 1. Introduction 2. System
More informationInterconnect Technology and Computational Speed
Interconnect Technology and Computational Speed From Chapter 1 of B. Wilkinson et al., PARAL- LEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers, augmented
More information3 Competitive Dynamic BSTs (January 31 and February 2)
3 Competitive Dynamic BSTs (January 31 and February ) In their original paper on splay trees [3], Danny Sleator and Bob Tarjan conjectured that the cost of sequence of searches in a splay tree is within
More informationA Synchronization Algorithm for Distributed Systems
A Synchronization Algorithm for Distributed Systems Tai-Kuo Woo Department of Computer Science Jacksonville University Jacksonville, FL 32211 Kenneth Block Department of Computer and Information Science
More informationLoad-Balancing Iterative Computations on Heterogeneous Clusters with Shared Communication Links
Load-Balancing Iterative Computations on Heterogeneous Clusters with Shared Communication Links Arnaud Legrand, Hélène Renard, Yves Robert, and Frédéric Vivien LIP, UMR CNRS-INRIA-UCBL 5668, École normale
More informationMapping Heuristics in Heterogeneous Computing
Mapping Heuristics in Heterogeneous Computing Alexandru Samachisa Dmitriy Bekker Multiple Processor Systems (EECC756) May 18, 2006 Dr. Shaaban Overview Introduction Mapping overview Homogenous computing
More informationSystems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15
Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2014/15 Lecture X: Parallel Databases Topics Motivation and Goals Architectures Data placement Query processing Load balancing
More informationApproximation Algorithms for Wavelength Assignment
Approximation Algorithms for Wavelength Assignment Vijay Kumar Atri Rudra Abstract Winkler and Zhang introduced the FIBER MINIMIZATION problem in [3]. They showed that the problem is NP-complete but left
More informationTrafficDB: HERE s High Performance Shared-Memory Data Store Ricardo Fernandes, Piotr Zaczkowski, Bernd Göttler, Conor Ettinoffe, and Anis Moussa
TrafficDB: HERE s High Performance Shared-Memory Data Store Ricardo Fernandes, Piotr Zaczkowski, Bernd Göttler, Conor Ettinoffe, and Anis Moussa EPL646: Advanced Topics in Databases Christos Hadjistyllis
More informationData Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions
Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions Ziming Zhong Vladimir Rychkov Alexey Lastovetsky Heterogeneous Computing
More informationDesign and Analysis of Algorithms (VII)
Design and Analysis of Algorithms (VII) An Introduction to Randomized Algorithms Guoqiang Li School of Software, Shanghai Jiao Tong University Randomized Algorithms algorithms which employ a degree of
More informationThe Complexity of Minimizing Certain Cost Metrics for k-source Spanning Trees
The Complexity of Minimizing Certain Cost Metrics for ksource Spanning Trees Harold S Connamacher University of Oregon Andrzej Proskurowski University of Oregon May 9 2001 Abstract We investigate multisource
More informationLecture 1: A Monte Carlo Minimum Cut Algorithm
Randomized Algorithms Lecture 1: A Monte Carlo Minimum Cut Algorithm Sotiris Nikoletseas Professor CEID - ETY Course 2017-2018 Sotiris Nikoletseas, Professor A Monte Carlo Minimum Cut Algorithm 1 / 32
More informationApplication Provisioning in Fog Computingenabled Internet-of-Things: A Network Perspective
Application Provisioning in Fog Computingenabled Internet-of-Things: A Network Perspective Ruozhou Yu, Guoliang Xue, and Xiang Zhang Arizona State University Outlines Background and Motivation System Modeling
More informationMinimization of NBTI Performance Degradation Using Internal Node Control
Minimization of NBTI Performance Degradation Using Internal Node Control David R. Bild, Gregory E. Bok, and Robert P. Dick Department of EECS Nico Trading University of Michigan 3 S. Wacker Drive, Suite
More information6.231 DYNAMIC PROGRAMMING LECTURE 14 LECTURE OUTLINE
6.231 DYNAMIC PROGRAMMING LECTURE 14 LECTURE OUTLINE Limited lookahead policies Performance bounds Computational aspects Problem approximation approach Vehicle routing example Heuristic cost-to-go approximation
More informationSymbolic Buffer Sizing for Throughput-Optimal Scheduling of Dataflow Graphs
Symbolic Buffer Sizing for Throughput-Optimal Scheduling of Dataflow Graphs Anan Bouakaz Pascal Fradet Alain Girault Real-Time and Embedded Technology and Applications Symposium, Vienna April 14th, 2016
More informationParallel Databases C H A P T E R18. Practice Exercises
C H A P T E R18 Parallel Databases Practice Exercises 181 In a range selection on a range-partitioned attribute, it is possible that only one disk may need to be accessed Describe the benefits and drawbacks
More informationClustering and Mapping Algorithm for Application Distribution on a Scalable FPGA Cluster
Clustering and Mapping Algorithm for Application Distribution on a Scalable FPGA Cluster Lester Kalms Diana Göhringer Ruhr-University Bochum, Germany lester.kalms@rub.de diana.goehringer@rub.de Research
More informationRandomized algorithms have several advantages over deterministic ones. We discuss them here:
CS787: Advanced Algorithms Lecture 6: Randomized Algorithms In this lecture we introduce randomized algorithms. We will begin by motivating the use of randomized algorithms through a few examples. Then
More informationThroughput-Optimal Broadcast in Wireless Networks with Point-to-Multipoint Transmissions
Throughput-Optimal Broadcast in Wireless Networks with Point-to-Multipoint Transmissions Abhishek Sinha Laboratory for Information and Decision Systems MIT MobiHoc, 2017 April 18, 2017 1 / 63 Introduction
More informationHighway Dimension and Provably Efficient Shortest Paths Algorithms
Highway Dimension and Provably Efficient Shortest Paths Algorithms Andrew V. Goldberg Microsoft Research Silicon Valley www.research.microsoft.com/ goldberg/ Joint with Ittai Abraham, Amos Fiat, and Renato
More informationCompilation for Heterogeneous Platforms
Compilation for Heterogeneous Platforms Grid in a Box and on a Chip Ken Kennedy Rice University http://www.cs.rice.edu/~ken/presentations/heterogeneous.pdf Senior Researchers Ken Kennedy John Mellor-Crummey
More informationAdvanced Databases: Parallel Databases A.Poulovassilis
1 Advanced Databases: Parallel Databases A.Poulovassilis 1 Parallel Database Architectures Parallel database systems use parallel processing techniques to achieve faster DBMS performance and handle larger
More informationEnergy-Aware Scheduling for Acyclic Synchronous Data Flows on Multiprocessors
Journal of Interconnection Networks c World Scientific Publishing Company Energy-Aware Scheduling for Acyclic Synchronous Data Flows on Multiprocessors DAWEI LI and JIE WU Department of Computer and Information
More informationHDFS Architecture. Gregory Kesden, CSE-291 (Storage Systems) Fall 2017
HDFS Architecture Gregory Kesden, CSE-291 (Storage Systems) Fall 2017 Based Upon: http://hadoop.apache.org/docs/r3.0.0-alpha1/hadoopproject-dist/hadoop-hdfs/hdfsdesign.html Assumptions At scale, hardware
More informationIt also performs many parallelization operations like, data loading and query processing.
Introduction to Parallel Databases Companies need to handle huge amount of data with high data transfer rate. The client server and centralized system is not much efficient. The need to improve the efficiency
More informationEnergy-aware mappings of series-parallel workflows onto chip multiprocessors
INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE Energy-aware mappings of series-parallel workflows onto chip multiprocessors Anne Benoit Rami Melhem Paul Renaud-Goud Yves Robert N 7521
More informationPYTHIA: Improving Datacenter Utilization via Precise Contention Prediction for Multiple Co-located Workloads
PYTHIA: Improving Datacenter Utilization via Precise Contention Prediction for Multiple Co-located Workloads Ran Xu (Purdue), Subrata Mitra (Adobe Research), Jason Rahman (Facebook), Peter Bai (Purdue),
More informationL41: Lab 2 - Kernel implications of IPC
L41: Lab 2 - Kernel implications of IPC Dr Robert N.M. Watson Michaelmas Term 2015 The goals of this lab are to: Continue to gain experience tracing user-kernel interactions via system calls Explore the
More informationSynthesis and Optimization of Digital Circuits
Synthesis and Optimization of Digital Circuits Dr. Travis Doom Wright State University Computer Science and Engineering Outline Introduction Microelectronics Micro economics What is design? Techniques
More informationGrid Scheduler. Grid Information Service. Local Resource Manager L l Resource Manager. Single CPU (Time Shared Allocation) (Space Shared Allocation)
Scheduling on the Grid 1 2 Grid Scheduling Architecture User Application Grid Scheduler Grid Information Service Local Resource Manager Local Resource Manager Local L l Resource Manager 2100 2100 2100
More informationCopyright 2007 Ramez Elmasri and Shamkant B. Navathe Slide 25-1
Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Slide 25-1 Chapter 25 Distributed Databases and Client-Server Architectures Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 25 Outline
More informationCluster Editing with Locally Bounded Modifications Revisited
Cluster Editing with Locally Bounded Modifications Revisited Peter Damaschke Department of Computer Science and Engineering Chalmers University, 41296 Göteborg, Sweden ptr@chalmers.se Abstract. For Cluster
More informationFuxiSort. Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc
Fuxi Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc {jiamang.wang, yongjun.wyj, hua.caihua, zhipeng.tzp, zhiqiang.lv,
More informationWrite a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical
Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or
More informationParallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides)
Parallel Computing 2012 Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Algorithm Design Outline Computational Model Design Methodology Partitioning Communication
More informationStanford University CS359G: Graph Partitioning and Expanders Handout 1 Luca Trevisan January 4, 2011
Stanford University CS359G: Graph Partitioning and Expanders Handout 1 Luca Trevisan January 4, 2011 Lecture 1 In which we describe what this course is about. 1 Overview This class is about the following
More informationRESTORE: REUSING RESULTS OF MAPREDUCE JOBS. Presented by: Ahmed Elbagoury
RESTORE: REUSING RESULTS OF MAPREDUCE JOBS Presented by: Ahmed Elbagoury Outline Background & Motivation What is Restore? Types of Result Reuse System Architecture Experiments Conclusion Discussion Background
More informationAn Approach for Integrating Basic Retiming and Software Pipelining
An Approach for Integrating Basic Retiming and Software Pipelining Noureddine Chabini Department of Electrical and Computer Engineering Royal Military College of Canada PB 7000 Station Forces Kingston
More informationResource Allocation Strategies for In-Network Stream Processing
Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON-UCBL n o 5668 Resource Allocation Strategies for In-Network Stream Processing
More informationvsan Management Cluster First Published On: Last Updated On:
First Published On: 07-07-2016 Last Updated On: 03-05-2018 1 1. vsan Management Cluster 1.1.Solution Overview Table Of Contents 2 1. vsan Management Cluster 3 1.1 Solution Overview HyperConverged Infrastructure
More informationAuto Source Code Generation and Run-Time Infrastructure and Environment for High Performance, Distributed Computing Systems
Auto Source Code Generation and Run-Time Infrastructure and Environment for High Performance, Distributed Computing Systems Minesh I. Patel Ph.D. 1, Karl Jordan 1, Mattew Clark Ph.D. 1, and Devesh Bhatt
More informationDistributed Scheduling for the Sombrero Single Address Space Distributed Operating System
Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System Donald S. Miller Department of Computer Science and Engineering Arizona State University Tempe, AZ, USA Alan C.
More informationA Parallel Hardware Architecture for Information-Theoretic Adaptive Filtering
A Parallel Hardware Architecture for Information-Theoretic Adaptive Filtering HPRCTA 2010 Stefan Craciun Dr. Alan D. George Dr. Herman Lam Dr. Jose C. Principe November 14, 2010 NSF CHREC Center ECE Department,
More informationMulti-resource Energy-efficient Routing in Cloud Data Centers with Network-as-a-Service
in Cloud Data Centers with Network-as-a-Service Lin Wang*, Antonio Fernández Antaº, Fa Zhang*, Jie Wu+, Zhiyong Liu* *Institute of Computing Technology, CAS, China ºIMDEA Networks Institute, Spain + Temple
More informationLINEAR CODES WITH NON-UNIFORM ERROR CORRECTION CAPABILITY
LINEAR CODES WITH NON-UNIFORM ERROR CORRECTION CAPABILITY By Margaret Ann Bernard The University of the West Indies and Bhu Dev Sharma Xavier University of Louisiana, New Orleans ABSTRACT This paper introduces
More informationE-Companion: On Styles in Product Design: An Analysis of US. Design Patents
E-Companion: On Styles in Product Design: An Analysis of US Design Patents 1 PART A: FORMALIZING THE DEFINITION OF STYLES A.1 Styles as categories of designs of similar form Our task involves categorizing
More informationSpatial Data Structures
15-462 Computer Graphics I Lecture 17 Spatial Data Structures Hierarchical Bounding Volumes Regular Grids Octrees BSP Trees Constructive Solid Geometry (CSG) April 1, 2003 [Angel 9.10] Frank Pfenning Carnegie
More informationHierarchical Chubby: A Scalable, Distributed Locking Service
Hierarchical Chubby: A Scalable, Distributed Locking Service Zoë Bohn and Emma Dauterman Abstract We describe a scalable, hierarchical version of Google s locking service, Chubby, designed for use by systems
More informationScheduling Algorithms to Minimize Session Delays
Scheduling Algorithms to Minimize Session Delays Nandita Dukkipati and David Gutierrez A Motivation I INTRODUCTION TCP flows constitute the majority of the traffic volume in the Internet today Most of
More informationMaximum Differential Graph Coloring
Maximum Differential Graph Coloring Sankar Veeramoni University of Arizona Joint work with Yifan Hu at AT&T Research and Stephen Kobourov at University of Arizona Motivation Map Coloring Problem Related
More informationCS599: Convex and Combinatorial Optimization Fall 2013 Lecture 14: Combinatorial Problems as Linear Programs I. Instructor: Shaddin Dughmi
CS599: Convex and Combinatorial Optimization Fall 2013 Lecture 14: Combinatorial Problems as Linear Programs I Instructor: Shaddin Dughmi Announcements Posted solutions to HW1 Today: Combinatorial problems
More informationA Distributed Hash Table for Shared Memory
A Distributed Hash Table for Shared Memory Wytse Oortwijn Formal Methods and Tools, University of Twente August 31, 2015 Wytse Oortwijn (Formal Methods and Tools, AUniversity Distributed of Twente) Hash
More information6.896 Topics in Algorithmic Game Theory March 1, Lecture 8
6.896 Topics in Algorithmic Game Theory March 1, 2010 Lecture 8 Lecturer: Constantinos Daskalakis Scribe: Alan Deckelbaum, Anthony Kim NOTE: The content of these notes has not been formally reviewed by
More information16/10/2008. Today s menu. PRAM Algorithms. What do you program? What do you expect from a model? Basic ideas for the machine model
Today s menu 1. What do you program? Parallel complexity and algorithms PRAM Algorithms 2. The PRAM Model Definition Metrics and notations Brent s principle A few simple algorithms & concepts» Parallel
More informationPHX: Memory Speed HPC I/O with NVM. Pradeep Fernando Sudarsun Kannan, Ada Gavrilovska, Karsten Schwan
PHX: Memory Speed HPC I/O with NVM Pradeep Fernando Sudarsun Kannan, Ada Gavrilovska, Karsten Schwan Node Local Persistent I/O? Node local checkpoint/ restart - Recover from transient failures ( node restart)
More informationBayeux: An Architecture for Scalable and Fault Tolerant Wide area Data Dissemination
Bayeux: An Architecture for Scalable and Fault Tolerant Wide area Data Dissemination By Shelley Zhuang,Ben Zhao,Anthony Joseph, Randy Katz,John Kubiatowicz Introduction Multimedia Streaming typically involves
More informationModalis. A First Step to the Evaluation of SimGrid in the Context of a Real Application. Abdou Guermouche and Hélène Renard, May 5, 2010
A First Step to the Evaluation of SimGrid in the Context of a Real Application Abdou Guermouche and Hélène Renard, LaBRI/Univ Bordeaux 1 I3S/École polytechnique universitaire de Nice-Sophia Antipolis May
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google* 정학수, 최주영 1 Outline Introduction Design Overview System Interactions Master Operation Fault Tolerance and Diagnosis Conclusions
More informationParallel Query Optimisation
Parallel Query Optimisation Contents Objectives of parallel query optimisation Parallel query optimisation Two-Phase optimisation One-Phase optimisation Inter-operator parallelism oriented optimisation
More informationOn Modularity Clustering. Group III (Ying Xuan, Swati Gambhir & Ravi Tiwari)
On Modularity Clustering Presented by: Presented by: Group III (Ying Xuan, Swati Gambhir & Ravi Tiwari) Modularity A quality index for clustering a graph G=(V,E) G=(VE) q( C): EC ( ) EC ( ) + ECC (, ')
More informationZHT: Const Eventual Consistency Support For ZHT. Group Member: Shukun Xie Ran Xin
ZHT: Const Eventual Consistency Support For ZHT Group Member: Shukun Xie Ran Xin Outline Problem Description Project Overview Solution Maintains Replica List for Each Server Operation without Primary Server
More informationSpatial Data Structures
15-462 Computer Graphics I Lecture 17 Spatial Data Structures Hierarchical Bounding Volumes Regular Grids Octrees BSP Trees Constructive Solid Geometry (CSG) March 28, 2002 [Angel 8.9] Frank Pfenning Carnegie
More information