Complexity results for throughput and latency optimization of replicated and data-parallel workflows

Size: px
Start display at page:

Download "Complexity results for throughput and latency optimization of replicated and data-parallel workflows"

Transcription

1 Complexity results for throughput and latency optimization of replicated and data-parallel workflows Anne Benoit and Yves Robert GRAAL team, LIP École Normale Supérieure de Lyon June 2007 June 2007 Workflows complexity results Alpage meeting 1/ 33

2 Introduction and motivation Mapping workflow applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping Mapping pipeline and fork workflows June 2007 Workflows complexity results Alpage meeting 2/ 33

3 Introduction and motivation Mapping workflow applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping Mapping pipeline and fork workflows June 2007 Workflows complexity results Alpage meeting 2/ 33

4 Introduction and motivation Mapping workflow applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping Mapping pipeline and fork workflows June 2007 Workflows complexity results Alpage meeting 2/ 33

5 Introduction and motivation Mapping workflow applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping Mapping pipeline and fork workflows June 2007 Workflows complexity results Alpage meeting 2/ 33

6 Introduction and motivation Mapping workflow applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping Mapping pipeline and fork workflows June 2007 Workflows complexity results Alpage meeting 2/ 33

7 Introduction and motivation Mapping workflow applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping Mapping pipeline and fork workflows June 2007 Workflows complexity results Alpage meeting 2/ 33

8 Rule of the game Consecutive data-sets fed into the workflow Period T period = time interval between beginning of execution of two consecutive data sets (throughput=1/t period ) Latency T latency (x) = time elapsed between beginning and end of execution for a given data set x, and T latency = max x T latency (x) Map each pipeline/fork stage on one or several processors Goal: minimize T period or T latency or bi-criteria minimization Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 3/ 33

9 Rule of the game Consecutive data-sets fed into the workflow Period T period = time interval between beginning of execution of two consecutive data sets (throughput=1/t period ) Latency T latency (x) = time elapsed between beginning and end of execution for a given data set x, and T latency = max x T latency (x) Map each pipeline/fork stage on one or several processors Goal: minimize T period or T latency or bi-criteria minimization Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 3/ 33

10 Replication and data-parallelism Replicate stage S k on P 1,..., P q... S k 1 S k on P 2 : data sets 2, 5, 8,... S k on P 1 : data sets 1, 4, 7,... S k on P 3 : data sets 3, 5, 9,... S k+1... Data-parallelize stage S k on P 1,..., P q S k (w = 16) P 1 (s 1 = 2) : P 2 (s 2 = 1) : P 3 (s 3 = 1) : Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 4/ 33

11 Replication and data-parallelism Replicate stage S k on P 1,..., P q... S k 1 S k on P 2 : data sets 2, 5, 8,... S k on P 1 : data sets 1, 4, 7,... S k on P 3 : data sets 3, 5, 9,... S k+1... Data-parallelize stage S k on P 1,..., P q S k (w = 16) P 1 (s 1 = 2) : P 2 (s 2 = 1) : P 3 (s 3 = 1) : Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 4/ 33

12 Major contributions Complexity results for throughput and latency optimization of replicated and data-parallel workflows Theoretical approach to the problem definition of replication and data-parallelism formal definition of T period and T latency in each case Problem complexity: focus on pipeline and fork workflows Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 5/ 33

13 Major contributions Complexity results for throughput and latency optimization of replicated and data-parallel workflows Theoretical approach to the problem definition of replication and data-parallelism formal definition of T period and T latency in each case Problem complexity: focus on pipeline and fork workflows Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 5/ 33

14 Outline 1 Framework 2 Working out an example 3 The problem 4 Complexity results 5 Conclusion Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 6/ 33

15 Outline 1 Framework 2 Working out an example 3 The problem 4 Complexity results 5 Conclusion Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 7/ 33

16 Pipeline graphs δ 0 δ 1 δ k 1 δ k δ n S 1 S 2 S k S n w 1 w 2 w k w n n stages S k, 1 k n S k : receives input of size δ k 1 from S k 1 performs w k computations outputs data of size δ k to S k+1 Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 8/ 33

17 Fork graphs δ 1 w 0 S 0 δ 0 δ 0 δ0 δ 0 w 1 w 2 S 2 w k S k w n S n S δ 1 δ 2 δ k δ n n + 1 stages S k, 0 k n S 0 : root stage S 1 to S n : independent stages A data set goes through stage S 0, then it can be executed simultaneously for all other stages Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 9/ 33

18 The platform s in P in P out s out b in,u b v,out P u s u b u,v P v s v p processors P u, 1 u p, fully interconnected s u : speed of processor P u bidirectional link link u,v : P u P v, bandwidth b u,v one-port model: each processor can either send, receive or compute at any time-step Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 10/ 33

19 Different platforms Fully Homogeneous Identical processors (s u = s) and links (b u,v = b): typical parallel machines Communication Homogeneous Different-speed processors (s u s v ), identical links (b u,v = b): networks of workstations, clusters Fully Heterogeneous Fully heterogeneous architectures, s u s v and b u,v b u,v : hierarchical platforms, grids Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 11/ 33

20 Back to pipeline: mapping strategies S 1 S 2... S k... S n The pipeline application Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 12/ 33

21 Back to pipeline: mapping strategies S 1 S 2... S k... S n One-to-one Mapping Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 12/ 33

22 Back to pipeline: mapping strategies S 1 S 2... S k... S n Interval Mapping Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 12/ 33

23 Back to pipeline: mapping strategies S 1 S 2... S k... S n General Mapping Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 12/ 33

24 Back to pipeline: mapping strategies S 1 S 2... S k... S n Interval Mapping In this work, Interval Mapping Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 12/ 33

25 Chains-on-chains Load-balance contiguous tasks Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 13/ 33

26 Chains-on-chains Load-balance contiguous tasks With p = 4 identical processors? Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 13/ 33

27 Chains-on-chains Load-balance contiguous tasks With p = 4 identical processors? T period = 20 Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 13/ 33

28 Chains-on-chains Load-balance contiguous tasks With p = 4 identical processors? T period = 20 NP-hard for different-speed processors, even without communications Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 13/ 33

29 Outline 1 Framework 2 Working out an example 3 The problem 4 Complexity results 5 Conclusion Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 14/ 33

30 Working out an example S 1 S 2 S 3 S Interval mapping, 4 processors, s 1 = 2 and s 2 = s 3 = s 4 = 1 Optimal period? T period = 7, S 1 P 1, S 2 S 3 P 2, S 4 P 3 (T latency = 17) Optimal latency? T latency = 12, S 1 S 2 S 3 S 4 P 1 (T period = 12) Min. latency if T period 10? T latency = 14, S 1 S 2 S 3 P 1, S 4 P 2 Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 15/ 33

31 Working out an example S 1 S 2 S 3 S Interval mapping, 4 processors, s 1 = 2 and s 2 = s 3 = s 4 = 1 Optimal period? T period = 7, S 1 P 1, S 2 S 3 P 2, S 4 P 3 (T latency = 17) Optimal latency? T latency = 12, S 1 S 2 S 3 S 4 P 1 (T period = 12) Min. latency if T period 10? T latency = 14, S 1 S 2 S 3 P 1, S 4 P 2 Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 15/ 33

32 Working out an example S 1 S 2 S 3 S Interval mapping, 4 processors, s 1 = 2 and s 2 = s 3 = s 4 = 1 Optimal period? T period = 7, S 1 P 1, S 2 S 3 P 2, S 4 P 3 (T latency = 17) Optimal latency? T latency = 12, S 1 S 2 S 3 S 4 P 1 (T period = 12) Min. latency if T period 10? T latency = 14, S 1 S 2 S 3 P 1, S 4 P 2 Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 15/ 33

33 Working out an example S 1 S 2 S 3 S Interval mapping, 4 processors, s 1 = 2 and s 2 = s 3 = s 4 = 1 Optimal period? T period = 7, S 1 P 1, S 2 S 3 P 2, S 4 P 3 (T latency = 17) Optimal latency? T latency = 12, S 1 S 2 S 3 S 4 P 1 (T period = 12) Min. latency if T period 10? T latency = 14, S 1 S 2 S 3 P 1, S 4 P 2 Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 15/ 33

34 Example with replication and data-parallelism S 1 S 2 S 3 S Interval mapping, 4 processors, s 1 = 2 and s 2 = s 3 = s 4 = 1 Replicate interval [S u..s v ] on P 1,..., P q... S S u... S v on P 1 : data sets 1, 4, 7,... S u... S v on P 2 : data sets 2, 5, 8,... S u... S v on P 3 : data sets 3, 5, 9,... S... T period = P v k=u w k q min i (s i ) and T latency = q T period Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 16/ 33

35 Example with replication and data-parallelism S 1 S 2 S 3 S Interval mapping, 4 processors, s 1 = 2 and s 2 = s 3 = s 4 = 1 Data Parallelize single stage S k on P 1,..., P q S (w = 16) T period = w k P q i=1 s i P 1 (s 1 = 2) : P 2 (s 2 = 1) : P 3 (s 3 = 1) : and T latency = T period Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 16/ 33

36 Example with replication and data-parallelism S 1 S 2 S 3 S Interval mapping, 4 processors, s 1 = 2 and s 2 = s 3 = s 4 = 1 Optimal period? Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 16/ 33

37 Example with replication and data-parallelism S 1 S 2 S 3 S Interval mapping, 4 processors, s 1 = 2 and s 2 = s 3 = s 4 = 1 Optimal period? DP S 1 P REP 1P 2, S 2 S 3 S 4 P 3P 4 T period = max( , ) = 5, T latency = Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 16/ 33

38 Example with replication and data-parallelism S 1 S 2 S 3 S Interval mapping, 4 processors, s 1 = 2 and s 2 = s 3 = s 4 = 1 Optimal period? DP S 1 P REP 1P 2, S 2 S 3 S 4 P 3P 4 T period = max( , ) = 5, T latency = DP S 1 P 2P 3 P 4, S 2 S 3 S 4 P 1 14 T period = max( 1+1+1, ) = 5, T latency = 9.67 (optimal) Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 16/ 33

39 Outline 1 Framework 2 Working out an example 3 The problem 4 Complexity results 5 Conclusion Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 17/ 33

40 Interval Mapping for pipeline graphs Several consecutive stages onto the same processor Increase computational load, reduce communications Partition of [1..n] into m intervals I j = [d j, e j ] (with d j e j for 1 j m, d 1 = 1, d j+1 = e j + 1 for 1 j m 1 and e m = n) Interval I j mapped onto processor P alloc(j) T period = max 1 j m T latency = 1 j m { { δ dj 1 b alloc(j 1),alloc(j) + δ dj 1 b alloc(j 1),alloc(j) + ej i=d j w i s alloc(j) + ej i=d j w i s alloc(j) + δ ej b alloc(j),alloc(j+1) δ ej b alloc(j),alloc(j+1) } } Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 18/ 33

41 Interval Mapping for pipeline graphs Several consecutive stages onto the same processor Increase computational load, reduce communications Partition of [1..n] into m intervals I j = [d j, e j ] (with d j e j for 1 j m, d 1 = 1, d j+1 = e j + 1 for 1 j m 1 and e m = n) Interval I j mapped onto processor P alloc(j) T period = max 1 j m T latency = 1 j m { { δ dj 1 b alloc(j 1),alloc(j) + δ dj 1 b alloc(j 1),alloc(j) + ej i=d j w i s alloc(j) + ej i=d j w i s alloc(j) + δ ej b alloc(j),alloc(j+1) δ ej b alloc(j),alloc(j+1) } } Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 18/ 33

42 Interval Mapping for pipeline graphs Several consecutive stages onto the same processor Increase computational load, reduce communications Partition of [1..n] into m intervals I j = [d j, e j ] (with d j e j for 1 j m, d 1 = 1, d j+1 = e j + 1 for 1 j m 1 and e m = n) Interval I j mapped onto processor P alloc(j) T period = max 1 j m T latency = 1 j m { { δ dj 1 b alloc(j 1),alloc(j) + δ dj 1 b alloc(j 1),alloc(j) + ej i=d j w i s alloc(j) + ej i=d j w i s alloc(j) + δ ej b alloc(j),alloc(j+1) δ ej b alloc(j),alloc(j+1) } } Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 18/ 33

43 Interval Mapping for pipeline graphs Several consecutive stages onto the same processor Increase computational load, reduce communications Partition of [1..n] into m intervals I j = [d j, e j ] (with d j e j for 1 j m, d 1 = 1, d j+1 = e j + 1 for 1 j m 1 and e m = n) Interval I j mapped onto processor P alloc(j) T period = max 1 j m T latency = 1 j m { { δ dj 1 b alloc(j 1),alloc(j) + δ dj 1 b alloc(j 1),alloc(j) + ej i=d j w i s alloc(j) + ej i=d j w i s alloc(j) + δ ej b alloc(j),alloc(j+1) δ ej b alloc(j),alloc(j+1) } } Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 18/ 33

44 Fork graphs map any partition of the graph onto the processors q intervals, q p first interval: S 0 and possibly S 1 to S k next intervals of independent stages Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 19/ 33

45 Fork graphs map any partition of the graph onto the processors q intervals, q p first interval: S 0 and possibly S 1 to S k next intervals of independent stages T period =? Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 19/ 33

46 Fork graphs map any partition of the graph onto the processors q intervals, q p first interval: S 0 and possibly S 1 to S k next intervals of independent stages T period =? depends on the com model: is it possible to start com as soon as S 0 is done? Which order for com? Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 19/ 33

47 Fork graphs map any partition of the graph onto the processors q intervals, q p first interval: S 0 and possibly S 1 to S k next intervals of independent stages Informally: T period = max time needed by processor to receive data, compute, output result Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 19/ 33

48 Fork graphs map any partition of the graph onto the processors q intervals, q p first interval: S 0 and possibly S 1 to S k next intervals of independent stages Informally: T period = max time needed by processor to receive data, compute, output result T latency = time elapsed between data set input to S 0 until last computation for this data set is completed Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 19/ 33

49 Fork graphs map any partition of the graph onto the processors q intervals, q p first interval: S 0 and possibly S 1 to S k next intervals of independent stages Informally: T period = max time needed by processor to receive data, compute, output result T latency = time elapsed between data set input to S 0 until last computation for this data set is completed Simpler model for formal analysis Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 19/ 33

50 Back to a simpler problem No communication costs nor overheads Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 20/ 33

51 Back to a simpler problem No communication costs nor overheads Cost to execute S i on P u alone: w i s u Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 20/ 33

52 Back to a simpler problem No communication costs nor overheads Cost to execute S i on P u alone: Cost to data-parallelize [S i, S j ] (i = j for pipeline; 0 < i j or i = j = 0 for fork) on k processors P q1,..., P qk : w i s u j l=i w l k u=1 s q u. Cost = T period of assigned processors Cost = delay to traverse the interval Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 20/ 33

53 Back to a simpler problem No communication costs nor overheads Cost to execute S i on P u alone: Cost to data-parallelize Cost to replicate [S i, S j ] on k processors P q1,..., P qk : w i s u j l=i w l k min 1 u k s qu. Cost = T period of assigned processors Delay to traverse the interval = time needed by slowest processor: j l=i t max = w l min 1 u k s qu Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 20/ 33

54 Back to a simpler problem No communication costs nor overheads Cost to execute S i on P u alone: Cost to data-parallelize Cost to replicate With these formulas: easy to compute T period for both graphs, and T latency for pipeline graphs w i s u Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 20/ 33

55 Latency for a fork? partition of stages into q sets I r (1 r q p) S 0 I 1, to k processors P q1,..., P qk t max (r) = delay of r-th set (1 r q), computed as before flexible com model: computations of I r, r 2, start as soon as computation of S 0 is completed. s 0 = speed at which S 0 is processed: s 0 = k u=1 s q u if I 1 is data-parallelized s 0 = min 1 u k s qu if I 1 is replicated ( T latency = max t max (1), w ) 0 + max s t max(r) 0 2 r q Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 21/ 33

56 Latency for a fork? partition of stages into q sets I r (1 r q p) S 0 I 1, to k processors P q1,..., P qk t max (r) = delay of r-th set (1 r q), computed as before flexible com model: computations of I r, r 2, start as soon as computation of S 0 is completed. s 0 = speed at which S 0 is processed: s 0 = k u=1 s q u if I 1 is data-parallelized s 0 = min 1 u k s qu if I 1 is replicated ( T latency = max t max (1), w ) 0 + max s t max(r) 0 2 r q Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 21/ 33

57 Latency for a fork? partition of stages into q sets I r (1 r q p) S 0 I 1, to k processors P q1,..., P qk t max (r) = delay of r-th set (1 r q), computed as before flexible com model: computations of I r, r 2, start as soon as computation of S 0 is completed. s 0 = speed at which S 0 is processed: s 0 = k u=1 s q u if I 1 is data-parallelized s 0 = min 1 u k s qu if I 1 is replicated ( T latency = max t max (1), w ) 0 + max s t max(r) 0 2 r q Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 21/ 33

58 Latency for a fork? partition of stages into q sets I r (1 r q p) S 0 I 1, to k processors P q1,..., P qk t max (r) = delay of r-th set (1 r q), computed as before flexible com model: computations of I r, r 2, start as soon as computation of S 0 is completed. s 0 = speed at which S 0 is processed: s 0 = k u=1 s q u if I 1 is data-parallelized s 0 = min 1 u k s qu if I 1 is replicated ( T latency = max t max (1), w ) 0 + max s t max(r) 0 2 r q Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 21/ 33

59 Optimization problem Given an application graph (n-stage pipeline or (n + 1)-stage fork), a target platform (Homogeneous with p identical processors or Heterogeneous with p different-speed processors), a mapping strategy with replication, and either with data-parallelization or without, an objective (period T period or latency T latency ), determine an interval-based mapping that minimizes the objective 16 optimization problems Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 22/ 33

60 Optimization problem Given an application graph (n-stage pipeline or (n + 1)-stage fork), a target platform (Homogeneous with p identical processors or Heterogeneous with p different-speed processors), a mapping strategy with replication, and either with data-parallelization or without, an objective (period T period or latency T latency ), determine an interval-based mapping that minimizes the objective 16 optimization problems Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 22/ 33

61 Optimization problem Given an application graph (n-stage pipeline or (n + 1)-stage fork), a target platform (Homogeneous with p identical processors or Heterogeneous with p different-speed processors), a mapping strategy with replication, and either with data-parallelization or without, an objective (period T period or latency T latency ), determine an interval-based mapping that minimizes the objective 16 optimization problems Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 22/ 33

62 Bi-criteria optimization problem given threshold period P threshold, determine mapping whose period does not exceed P threshold and that minimizes T latency given threshold latency L threshold, determine mapping whose latency does not exceed L threshold and that minimizes T period Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 23/ 33

63 Outline 1 Framework 2 Working out an example 3 The problem 4 Complexity results 5 Conclusion Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 24/ 33

64 Complexity results Without data-parallelism, Homogeneous platforms Objective period latency bi-criteria Hom. pipeline - Het. pipeline Poly (str) Hom. fork - Poly (DP) Het. fork Poly (str) NP-hard Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 25/ 33

65 Complexity results With data-parallelism, Homogeneous platforms Objective period latency bi-criteria Hom. pipeline - Het. pipeline Poly (DP) Hom. fork - Poly (DP) Het. fork Poly (str) NP-hard Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 25/ 33

66 Complexity results Without data-parallelism, Heterogeneous platforms Objective period latency bi-criteria Hom. pipeline Poly (*) - Poly (*) Het. pipeline NP-hard (**) Poly (str) NP-hard Hom. fork Poly (*) Het. fork NP-hard - Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 25/ 33

67 Complexity results With data-parallelism, Heterogeneous platforms Objective period latency bi-criteria Hom. pipeline NP-hard Het. pipeline - Hom. fork NP-hard Het. fork - Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 25/ 33

68 Complexity results Most interesting case: Without data-parallelism, Heterogeneous platforms Objective period latency bi-criteria Hom. pipeline Poly (*) - Poly (*) Het. pipeline NP-hard (**) Poly (str) NP-hard Hom. fork Poly (*) Het. fork NP-hard - Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 25/ 33

69 No data-parallelism, Heterogeneous platforms For pipeline, minimizing the latency is straightforward: map all stages on fastest proc Minimizing the period is NP-hard (involved reduction similar to the heterogeneous chain-to-chain one) for general pipeline Homogeneous pipeline: all stages have same workload w: in this case, polynomial complexity. Polynomial bi-criteria algorithm for homogeneous pipeline June 2007 Workflows complexity results Alpage meeting 26/ 33

70 No data-parallelism, Heterogeneous platforms For pipeline, minimizing the latency is straightforward: map all stages on fastest proc Minimizing the period is NP-hard (involved reduction similar to the heterogeneous chain-to-chain one) for general pipeline Homogeneous pipeline: all stages have same workload w: in this case, polynomial complexity. Polynomial bi-criteria algorithm for homogeneous pipeline June 2007 Workflows complexity results Alpage meeting 26/ 33

71 Lemma: form of the solution Pipeline, no data-parallelism, Heterogeneous platform Lemma If an optimal solution which minimizes pipeline period uses q processors, consider q fastest processors P 1,..., P q, ordered by non-decreasing speeds: s 1... s q. There exists an optimal solution which replicates intervals of stages onto k intervals of processors I r = [P dr, P er ], with 1 r k q, d 1 = 1, e k = q, and e r + 1 = d r+1 for 1 r < k. Proof: exchange argument, which does not increase latency Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 27/ 33

72 Lemma: form of the solution Pipeline, no data-parallelism, Heterogeneous platform Lemma If an optimal solution which minimizes pipeline period uses q processors, consider q fastest processors P 1,..., P q, ordered by non-decreasing speeds: s 1... s q. There exists an optimal solution which replicates intervals of stages onto k intervals of processors I r = [P dr, P er ], with 1 r k q, d 1 = 1, e k = q, and e r + 1 = d r+1 for 1 r < k. Proof: exchange argument, which does not increase latency Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 27/ 33

73 Binary-search/Dynamic programming algorithm Given latency L, given period K Loop on number of processors q Dynamic programming algorithm to minimize latency Success if L is obtained Binary search on L to minimize latency for fixed period Binary search on K to minimize period for fixed latency Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 28/ 33

74 Binary-search/Dynamic programming algorithm Given latency L, given period K Loop on number of processors q Dynamic programming algorithm to minimize latency Success if L is obtained Binary search on L to minimize latency for fixed period Binary search on K to minimize period for fixed latency Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 28/ 33

75 Dynamic programming algorithm Compute L(n, 1, q), where L(m, i, j) = minimum latency to map m pipeline stages on processors P i to P j, while fitting in period K. L(m, i, j) = min 1 m < m i k < j { m.w s i if m.w (j i).s i K (1) L(m, i, k) + L(m m, k + 1, j) (2) Case (1): replicating m stages onto processors P i,..., P j Case (2): splitting the interval Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 29/ 33

76 Dynamic programming algorithm Compute L(n, 1, q), where L(m, i, j) = minimum latency to map m pipeline stages on processors P i to P j, while fitting in period K. L(m, i, j) = Initialization: min 1 m < m i k < j L(1, i, j) = L(m, i, i) = { m.w s i if m.w (j i).s i K (1) L(m, i, k) + L(m m, k + 1, j) (2) { w w s i if (j i).s i K + otherwise { m.w s i m.w if s i + otherwise K Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 29/ 33

77 Dynamic programming algorithm Compute L(n, 1, q), where L(m, i, j) = minimum latency to map m pipeline stages on processors P i to P j, while fitting in period K. L(m, i, j) = min 1 m < m i k < j { m.w s i if m.w (j i).s i K (1) L(m, i, k) + L(m m, k + 1, j) (2) Complexity of the dynamic programming: O(n 2.p 4 ) Number of iterations of the binary search formally bounded, very small number of iterations in practice. Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 29/ 33

78 Outline 1 Framework 2 Working out an example 3 The problem 4 Complexity results 5 Conclusion Anne.Benoit@ens-lyon.fr June 2007 Workflows complexity results Alpage meeting 30/ 33

79 Related work Subhlok and Vondran Extension of their work (pipeline on hom platforms) Chains-to-chains In our work possibility to replicate or data-parallelize Mapping pipelined computations onto clusters and grids DAG [Taura et al.], DataCutter [Saltz et al.] Energy-aware mapping of pipelined computations [Melhem et al.], three-criteria optimization Mapping pipelined computations onto special-purpose architectures FPGA arrays [Fabiani et al.]. Fault-tolerance for embedded systems [Zhu et al.] Mapping skeletons onto clusters and grids Use of stochastic process algebra [Benoit et al.] June 2007 Workflows complexity results Alpage meeting 31/ 33

80 Conclusion Mapping structured workflow applications onto computational platforms, with replication and data-parallelism Complexity of the most tractable instances insight of the combinatorial nature of the problem Pipeline and fork graphs, extension to fork-join Homogeneous and Heterogeneous platforms with no communications Minimizing period or latency, and bi-criteria optimization problems Solid theoretical foundation for study of single/bi-criteria mappings, with possibility to replicate and data-parallelize application stages June 2007 Workflows complexity results Alpage meeting 32/ 33

81 Conclusion Mapping structured workflow applications onto computational platforms, with replication and data-parallelism Complexity of the most tractable instances insight of the combinatorial nature of the problem Pipeline and fork graphs, extension to fork-join Homogeneous and Heterogeneous platforms with no communications Minimizing period or latency, and bi-criteria optimization problems Solid theoretical foundation for study of single/bi-criteria mappings, with possibility to replicate and data-parallelize application stages June 2007 Workflows complexity results Alpage meeting 32/ 33

82 Future work Short term Longer term Select polynomial instances of the problem and assess complexity when adding communication Design heuristics to solve combinatorial instances of the problem Heuristics based on our polynomial algorithms for general application graphs structured as combinations of pipeline and fork kernels Real experiments on heterogeneous clusters Comparison of effective performance against theoretical performance June 2007 Workflows complexity results Alpage meeting 33/ 33

Complexity Results for Throughput and Latency Optimization of Replicated and Data-parallel Workflows

Complexity Results for Throughput and Latency Optimization of Replicated and Data-parallel Workflows Complexity Results for Throughput and Latency Optimization of Replicated and Data-parallel Workflows Anne Benoit and Yves Robert GRAAL team, LIP École Normale Supérieure de Lyon September 2007 Anne.Benoit@ens-lyon.fr

More information

Mapping pipeline skeletons onto heterogeneous platforms

Mapping pipeline skeletons onto heterogeneous platforms Mapping pipeline skeletons onto heterogeneous platforms Anne Benoit and Yves Robert GRAAL team, LIP École Normale Supérieure de Lyon January 2007 Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons

More information

Complexity results for throughput and latency optimization of replicated and data-parallel workflows

Complexity results for throughput and latency optimization of replicated and data-parallel workflows Complexity results for throughput and latency optimization of replicated and data-parallel workflows Anne Benoit and Yves Robert LIP, ENS Lyon, 46 Allée d Italie, 69364 Lyon Cedex 07, France UMR 5668 -

More information

Complexity Results for Throughput and Latency Optimization of Replicated and Data-parallel Workflows

Complexity Results for Throughput and Latency Optimization of Replicated and Data-parallel Workflows DOI 10.1007/s00453-008-9229-4 Complexity Results for Throughput and Latency Optimization of Replicated and Data-parallel Workflows Anne Benoit Yves Robert Received: 6 April 2007 / Accepted: 19 September

More information

Mapping Linear Workflows with Computation/Communication Overlap

Mapping Linear Workflows with Computation/Communication Overlap Mapping Linear Workflows with Computation/Communication Overlap 1 Kunal Agrawal 1, Anne Benoit,2 and Yves Robert 2 1 CSAIL, Massachusetts Institute of Technology, USA 2 LIP, École Normale Supérieure de

More information

Optimizing Latency and Reliability of Pipeline Workflow Applications

Optimizing Latency and Reliability of Pipeline Workflow Applications Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON-UCBL n o 5668 Optimizing Latency and Reliability of Pipeline Workflow Applications

More information

Multi-criteria scheduling of pipeline workflows

Multi-criteria scheduling of pipeline workflows Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON-UCBL n o 5668 Multi-criteria scheduling of pipeline workflows Anne Benoit, Veronika

More information

Mapping pipeline skeletons onto heterogeneous platforms

Mapping pipeline skeletons onto heterogeneous platforms INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE Mapping pipeline skeletons onto heterogeneous platforms Anne Benoit Yves Robert N 6087 January 007 Thème NUM apport de recherche ISSN 049-6399

More information

Scheduling Tasks Sharing Files from Distributed Repositories

Scheduling Tasks Sharing Files from Distributed Repositories from Distributed Repositories Arnaud Giersch 1, Yves Robert 2 and Frédéric Vivien 2 1 ICPS/LSIIT, University Louis Pasteur, Strasbourg, France 2 École normale supérieure de Lyon, France September 1, 2004

More information

Scheduling on clusters and grids

Scheduling on clusters and grids Some basics on scheduling theory Grégory Mounié, Yves Robert et Denis Trystram ID-IMAG 6 mars 2006 Some basics on scheduling theory 1 Some basics on scheduling theory Notations and Definitions List scheduling

More information

Database Architectures

Database Architectures B0B36DBS, BD6B36DBS: Database Systems h p://www.ksi.m.cuni.cz/~svoboda/courses/172-b0b36dbs/ Lecture 11 Database Architectures Authors: Tomáš Skopal, Irena Holubová Lecturer: Mar n Svoboda, mar n.svoboda@fel.cvut.cz

More information

Scheduling tasks sharing files on heterogeneous master-slave platforms

Scheduling tasks sharing files on heterogeneous master-slave platforms Scheduling tasks sharing files on heterogeneous master-slave platforms Arnaud Giersch 1, Yves Robert 2, and Frédéric Vivien 2 1: ICPS/LSIIT, UMR CNRS ULP 7005, Strasbourg, France 2: LIP, UMR CNRS ENS Lyon

More information

Mapping a Dynamic Prefix Tree on a P2P Network

Mapping a Dynamic Prefix Tree on a P2P Network Mapping a Dynamic Prefix Tree on a P2P Network Eddy Caron, Frédéric Desprez, Cédric Tedeschi GRAAL WG - October 26, 2006 Outline 1 Introduction 2 Related Work 3 DLPT architecture 4 Mapping 5 Conclusion

More information

Reliability and performance optimization of pipelined real-time systems. Research Report LIP December 2009

Reliability and performance optimization of pipelined real-time systems. Research Report LIP December 2009 Reliability and performance optimization of pipelined real-time systems Anne Benoit, Fanny Dufossé, Alain Girault, and Yves Robert ENS Lyon and INRIA Grenoble Rhône-Alpes, France {Firstname.Lastname}@inria.fr

More information

DESIGN AND OVERHEAD ANALYSIS OF WORKFLOWS IN GRID

DESIGN AND OVERHEAD ANALYSIS OF WORKFLOWS IN GRID I J D M S C L Volume 6, o. 1, January-June 2015 DESIG AD OVERHEAD AALYSIS OF WORKFLOWS I GRID S. JAMUA 1, K. REKHA 2, AD R. KAHAVEL 3 ABSRAC Grid workflow execution is approached as a pure best effort

More information

Mapping pipeline skeletons onto heterogeneous platforms

Mapping pipeline skeletons onto heterogeneous platforms INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE Mapping pipeline skeletons onto heterogeneous platforms Anne Benoit Yves Robert N 6087 January 007 Thème NUM apport de recherche ISSN 049-6399

More information

Bi-criteria Pipeline Mappings for Parallel Image Processing

Bi-criteria Pipeline Mappings for Parallel Image Processing Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON-UCBL n o 5668 Bi-criteria Pipeline Mappings for Parallel Image Processing Anne

More information

OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI

OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI CMPE 655- MULTIPLE PROCESSOR SYSTEMS OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI What is MULTI PROCESSING?? Multiprocessing is the coordinated processing

More information

Optimizing Latency and Reliability of Pipeline Workflow Applications

Optimizing Latency and Reliability of Pipeline Workflow Applications INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE arxiv:0711.1231v3 [cs.dc] 26 Mar 2008 Optimizing Latency and Reliability of Pipeline Workflow Applications Anne Benoit Veronika Rehn-Sonigo

More information

Parallel algorithms at ENS Lyon

Parallel algorithms at ENS Lyon Parallel algorithms at ENS Lyon Yves Robert Ecole Normale Supérieure de Lyon & Institut Universitaire de France TCPP Workshop February 2010 Yves.Robert@ens-lyon.fr February 2010 Parallel algorithms 1/

More information

Chapter 20: Database System Architectures

Chapter 20: Database System Architectures Chapter 20: Database System Architectures Chapter 20: Database System Architectures Centralized and Client-Server Systems Server System Architectures Parallel Systems Distributed Systems Network Types

More information

Strategies for Replica Placement in Tree Networks

Strategies for Replica Placement in Tree Networks Strategies for Replica Placement in Tree Networks http://graal.ens-lyon.fr/~lmarchal/scheduling/ 2 avril 2009 Introduction and motivation Replica placement in tree networks Set of clients (tree leaves):

More information

Centralized versus distributed schedulers for multiple bag-of-task applications

Centralized versus distributed schedulers for multiple bag-of-task applications Centralized versus distributed schedulers for multiple bag-of-task applications O. Beaumont, L. Carter, J. Ferrante, A. Legrand, L. Marchal and Y. Robert Laboratoire LaBRI, CNRS Bordeaux, France Dept.

More information

Lecture 23 Database System Architectures

Lecture 23 Database System Architectures CMSC 461, Database Management Systems Spring 2018 Lecture 23 Database System Architectures These slides are based on Database System Concepts 6 th edition book (whereas some quotes and figures are used

More information

Scheduling tasks sharing files on heterogeneous master-slave platforms

Scheduling tasks sharing files on heterogeneous master-slave platforms Scheduling tasks sharing files on heterogeneous master-slave platforms Arnaud Giersch a,, Yves Robert b, Frédéric Vivien b a ICPS/LSIIT, UMR CNRS ULP 7005 Parc d Innovation, Bd Sébastien Brant, BP 10413,

More information

Chapter 18: Database System Architectures.! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems!

Chapter 18: Database System Architectures.! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Chapter 18: Database System Architectures! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types 18.1 Centralized Systems! Run on a single computer system and

More information

SHARCNET Workshop on Parallel Computing. Hugh Merz Laurentian University May 2008

SHARCNET Workshop on Parallel Computing. Hugh Merz Laurentian University May 2008 SHARCNET Workshop on Parallel Computing Hugh Merz Laurentian University May 2008 What is Parallel Computing? A computational method that utilizes multiple processing elements to solve a problem in tandem

More information

Centralized versus distributed schedulers for multiple bag-of-task applications

Centralized versus distributed schedulers for multiple bag-of-task applications Centralized versus distributed schedulers for multiple bag-of-task applications O. Beaumont, L. Carter, J. Ferrante, A. Legrand, L. Marchal and Y. Robert Laboratoire LaBRI, CNRS Bordeaux, France Dept.

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Introduction to Parallel Computing George Karypis Sorting Outline Background Sorting Networks Quicksort Bucket-Sort & Sample-Sort Background Input Specification Each processor has n/p elements A ordering

More information

Counting the Number of Eulerian Orientations

Counting the Number of Eulerian Orientations Counting the Number of Eulerian Orientations Zhenghui Wang March 16, 011 1 Introduction Consider an undirected Eulerian graph, a graph in which each vertex has even degree. An Eulerian orientation of the

More information

Adding Data Parallelism to Streaming Pipelines for Throughput Optimization

Adding Data Parallelism to Streaming Pipelines for Throughput Optimization Adding Data Parallelism to Streaming Pipelines for Throughput Optimization Peng Li, Kunal Agrawal, Jeremy Buhler, Roger D. Chamberlain Department of Computer Science and Engineering Washington University

More information

Distributed Computing: PVM, MPI, and MOSIX. Multiple Processor Systems. Dr. Shaaban. Judd E.N. Jenne

Distributed Computing: PVM, MPI, and MOSIX. Multiple Processor Systems. Dr. Shaaban. Judd E.N. Jenne Distributed Computing: PVM, MPI, and MOSIX Multiple Processor Systems Dr. Shaaban Judd E.N. Jenne May 21, 1999 Abstract: Distributed computing is emerging as the preferred means of supporting parallel

More information

sflow: Towards Resource-Efficient and Agile Service Federation in Service Overlay Networks

sflow: Towards Resource-Efficient and Agile Service Federation in Service Overlay Networks sflow: Towards Resource-Efficient and Agile Service Federation in Service Overlay Networks Mea Wang, Baochun Li, Zongpeng Li Department of Electrical and Computer Engineering University of Toronto {mea,

More information

CHAPTER 7 CONCLUSION AND FUTURE SCOPE

CHAPTER 7 CONCLUSION AND FUTURE SCOPE 121 CHAPTER 7 CONCLUSION AND FUTURE SCOPE This research has addressed the issues of grid scheduling, load balancing and fault tolerance for large scale computational grids. To investigate the solution

More information

Distributed Data Infrastructures, Fall 2017, Chapter 2. Jussi Kangasharju

Distributed Data Infrastructures, Fall 2017, Chapter 2. Jussi Kangasharju Distributed Data Infrastructures, Fall 2017, Chapter 2 Jussi Kangasharju Chapter Outline Warehouse-scale computing overview Workloads and software infrastructure Failures and repairs Note: Term Warehouse-scale

More information

Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment

Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment Xin-Wei Shih, Tzu-Hsuan Hsu, Hsu-Chieh Lee, Yao-Wen Chang, Kai-Yuan Chao 2013.01.24 1 Outline 2 Clock Network Synthesis Clock network

More information

Energy-aware Scheduling for Frame-based Tasks on Heterogeneous Multiprocessor Platforms

Energy-aware Scheduling for Frame-based Tasks on Heterogeneous Multiprocessor Platforms Energy-aware Scheduling for Frame-based Tasks on Heterogeneous Multiprocessor Platforms Dawei Li and Jie Wu Temple University, Philadelphia, PA, USA 2012-9-13 ICPP 2012 1 Outline 1. Introduction 2. System

More information

Interconnect Technology and Computational Speed

Interconnect Technology and Computational Speed Interconnect Technology and Computational Speed From Chapter 1 of B. Wilkinson et al., PARAL- LEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers, augmented

More information

3 Competitive Dynamic BSTs (January 31 and February 2)

3 Competitive Dynamic BSTs (January 31 and February 2) 3 Competitive Dynamic BSTs (January 31 and February ) In their original paper on splay trees [3], Danny Sleator and Bob Tarjan conjectured that the cost of sequence of searches in a splay tree is within

More information

A Synchronization Algorithm for Distributed Systems

A Synchronization Algorithm for Distributed Systems A Synchronization Algorithm for Distributed Systems Tai-Kuo Woo Department of Computer Science Jacksonville University Jacksonville, FL 32211 Kenneth Block Department of Computer and Information Science

More information

Load-Balancing Iterative Computations on Heterogeneous Clusters with Shared Communication Links

Load-Balancing Iterative Computations on Heterogeneous Clusters with Shared Communication Links Load-Balancing Iterative Computations on Heterogeneous Clusters with Shared Communication Links Arnaud Legrand, Hélène Renard, Yves Robert, and Frédéric Vivien LIP, UMR CNRS-INRIA-UCBL 5668, École normale

More information

Mapping Heuristics in Heterogeneous Computing

Mapping Heuristics in Heterogeneous Computing Mapping Heuristics in Heterogeneous Computing Alexandru Samachisa Dmitriy Bekker Multiple Processor Systems (EECC756) May 18, 2006 Dr. Shaaban Overview Introduction Mapping overview Homogenous computing

More information

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15 Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2014/15 Lecture X: Parallel Databases Topics Motivation and Goals Architectures Data placement Query processing Load balancing

More information

Approximation Algorithms for Wavelength Assignment

Approximation Algorithms for Wavelength Assignment Approximation Algorithms for Wavelength Assignment Vijay Kumar Atri Rudra Abstract Winkler and Zhang introduced the FIBER MINIMIZATION problem in [3]. They showed that the problem is NP-complete but left

More information

TrafficDB: HERE s High Performance Shared-Memory Data Store Ricardo Fernandes, Piotr Zaczkowski, Bernd Göttler, Conor Ettinoffe, and Anis Moussa

TrafficDB: HERE s High Performance Shared-Memory Data Store Ricardo Fernandes, Piotr Zaczkowski, Bernd Göttler, Conor Ettinoffe, and Anis Moussa TrafficDB: HERE s High Performance Shared-Memory Data Store Ricardo Fernandes, Piotr Zaczkowski, Bernd Göttler, Conor Ettinoffe, and Anis Moussa EPL646: Advanced Topics in Databases Christos Hadjistyllis

More information

Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions

Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions Ziming Zhong Vladimir Rychkov Alexey Lastovetsky Heterogeneous Computing

More information

Design and Analysis of Algorithms (VII)

Design and Analysis of Algorithms (VII) Design and Analysis of Algorithms (VII) An Introduction to Randomized Algorithms Guoqiang Li School of Software, Shanghai Jiao Tong University Randomized Algorithms algorithms which employ a degree of

More information

The Complexity of Minimizing Certain Cost Metrics for k-source Spanning Trees

The Complexity of Minimizing Certain Cost Metrics for k-source Spanning Trees The Complexity of Minimizing Certain Cost Metrics for ksource Spanning Trees Harold S Connamacher University of Oregon Andrzej Proskurowski University of Oregon May 9 2001 Abstract We investigate multisource

More information

Lecture 1: A Monte Carlo Minimum Cut Algorithm

Lecture 1: A Monte Carlo Minimum Cut Algorithm Randomized Algorithms Lecture 1: A Monte Carlo Minimum Cut Algorithm Sotiris Nikoletseas Professor CEID - ETY Course 2017-2018 Sotiris Nikoletseas, Professor A Monte Carlo Minimum Cut Algorithm 1 / 32

More information

Application Provisioning in Fog Computingenabled Internet-of-Things: A Network Perspective

Application Provisioning in Fog Computingenabled Internet-of-Things: A Network Perspective Application Provisioning in Fog Computingenabled Internet-of-Things: A Network Perspective Ruozhou Yu, Guoliang Xue, and Xiang Zhang Arizona State University Outlines Background and Motivation System Modeling

More information

Minimization of NBTI Performance Degradation Using Internal Node Control

Minimization of NBTI Performance Degradation Using Internal Node Control Minimization of NBTI Performance Degradation Using Internal Node Control David R. Bild, Gregory E. Bok, and Robert P. Dick Department of EECS Nico Trading University of Michigan 3 S. Wacker Drive, Suite

More information

6.231 DYNAMIC PROGRAMMING LECTURE 14 LECTURE OUTLINE

6.231 DYNAMIC PROGRAMMING LECTURE 14 LECTURE OUTLINE 6.231 DYNAMIC PROGRAMMING LECTURE 14 LECTURE OUTLINE Limited lookahead policies Performance bounds Computational aspects Problem approximation approach Vehicle routing example Heuristic cost-to-go approximation

More information

Symbolic Buffer Sizing for Throughput-Optimal Scheduling of Dataflow Graphs

Symbolic Buffer Sizing for Throughput-Optimal Scheduling of Dataflow Graphs Symbolic Buffer Sizing for Throughput-Optimal Scheduling of Dataflow Graphs Anan Bouakaz Pascal Fradet Alain Girault Real-Time and Embedded Technology and Applications Symposium, Vienna April 14th, 2016

More information

Parallel Databases C H A P T E R18. Practice Exercises

Parallel Databases C H A P T E R18. Practice Exercises C H A P T E R18 Parallel Databases Practice Exercises 181 In a range selection on a range-partitioned attribute, it is possible that only one disk may need to be accessed Describe the benefits and drawbacks

More information

Clustering and Mapping Algorithm for Application Distribution on a Scalable FPGA Cluster

Clustering and Mapping Algorithm for Application Distribution on a Scalable FPGA Cluster Clustering and Mapping Algorithm for Application Distribution on a Scalable FPGA Cluster Lester Kalms Diana Göhringer Ruhr-University Bochum, Germany lester.kalms@rub.de diana.goehringer@rub.de Research

More information

Randomized algorithms have several advantages over deterministic ones. We discuss them here:

Randomized algorithms have several advantages over deterministic ones. We discuss them here: CS787: Advanced Algorithms Lecture 6: Randomized Algorithms In this lecture we introduce randomized algorithms. We will begin by motivating the use of randomized algorithms through a few examples. Then

More information

Throughput-Optimal Broadcast in Wireless Networks with Point-to-Multipoint Transmissions

Throughput-Optimal Broadcast in Wireless Networks with Point-to-Multipoint Transmissions Throughput-Optimal Broadcast in Wireless Networks with Point-to-Multipoint Transmissions Abhishek Sinha Laboratory for Information and Decision Systems MIT MobiHoc, 2017 April 18, 2017 1 / 63 Introduction

More information

Highway Dimension and Provably Efficient Shortest Paths Algorithms

Highway Dimension and Provably Efficient Shortest Paths Algorithms Highway Dimension and Provably Efficient Shortest Paths Algorithms Andrew V. Goldberg Microsoft Research Silicon Valley www.research.microsoft.com/ goldberg/ Joint with Ittai Abraham, Amos Fiat, and Renato

More information

Compilation for Heterogeneous Platforms

Compilation for Heterogeneous Platforms Compilation for Heterogeneous Platforms Grid in a Box and on a Chip Ken Kennedy Rice University http://www.cs.rice.edu/~ken/presentations/heterogeneous.pdf Senior Researchers Ken Kennedy John Mellor-Crummey

More information

Advanced Databases: Parallel Databases A.Poulovassilis

Advanced Databases: Parallel Databases A.Poulovassilis 1 Advanced Databases: Parallel Databases A.Poulovassilis 1 Parallel Database Architectures Parallel database systems use parallel processing techniques to achieve faster DBMS performance and handle larger

More information

Energy-Aware Scheduling for Acyclic Synchronous Data Flows on Multiprocessors

Energy-Aware Scheduling for Acyclic Synchronous Data Flows on Multiprocessors Journal of Interconnection Networks c World Scientific Publishing Company Energy-Aware Scheduling for Acyclic Synchronous Data Flows on Multiprocessors DAWEI LI and JIE WU Department of Computer and Information

More information

HDFS Architecture. Gregory Kesden, CSE-291 (Storage Systems) Fall 2017

HDFS Architecture. Gregory Kesden, CSE-291 (Storage Systems) Fall 2017 HDFS Architecture Gregory Kesden, CSE-291 (Storage Systems) Fall 2017 Based Upon: http://hadoop.apache.org/docs/r3.0.0-alpha1/hadoopproject-dist/hadoop-hdfs/hdfsdesign.html Assumptions At scale, hardware

More information

It also performs many parallelization operations like, data loading and query processing.

It also performs many parallelization operations like, data loading and query processing. Introduction to Parallel Databases Companies need to handle huge amount of data with high data transfer rate. The client server and centralized system is not much efficient. The need to improve the efficiency

More information

Energy-aware mappings of series-parallel workflows onto chip multiprocessors

Energy-aware mappings of series-parallel workflows onto chip multiprocessors INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE Energy-aware mappings of series-parallel workflows onto chip multiprocessors Anne Benoit Rami Melhem Paul Renaud-Goud Yves Robert N 7521

More information

PYTHIA: Improving Datacenter Utilization via Precise Contention Prediction for Multiple Co-located Workloads

PYTHIA: Improving Datacenter Utilization via Precise Contention Prediction for Multiple Co-located Workloads PYTHIA: Improving Datacenter Utilization via Precise Contention Prediction for Multiple Co-located Workloads Ran Xu (Purdue), Subrata Mitra (Adobe Research), Jason Rahman (Facebook), Peter Bai (Purdue),

More information

L41: Lab 2 - Kernel implications of IPC

L41: Lab 2 - Kernel implications of IPC L41: Lab 2 - Kernel implications of IPC Dr Robert N.M. Watson Michaelmas Term 2015 The goals of this lab are to: Continue to gain experience tracing user-kernel interactions via system calls Explore the

More information

Synthesis and Optimization of Digital Circuits

Synthesis and Optimization of Digital Circuits Synthesis and Optimization of Digital Circuits Dr. Travis Doom Wright State University Computer Science and Engineering Outline Introduction Microelectronics Micro economics What is design? Techniques

More information

Grid Scheduler. Grid Information Service. Local Resource Manager L l Resource Manager. Single CPU (Time Shared Allocation) (Space Shared Allocation)

Grid Scheduler. Grid Information Service. Local Resource Manager L l Resource Manager. Single CPU (Time Shared Allocation) (Space Shared Allocation) Scheduling on the Grid 1 2 Grid Scheduling Architecture User Application Grid Scheduler Grid Information Service Local Resource Manager Local Resource Manager Local L l Resource Manager 2100 2100 2100

More information

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Slide 25-1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Slide 25-1 Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Slide 25-1 Chapter 25 Distributed Databases and Client-Server Architectures Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 25 Outline

More information

Cluster Editing with Locally Bounded Modifications Revisited

Cluster Editing with Locally Bounded Modifications Revisited Cluster Editing with Locally Bounded Modifications Revisited Peter Damaschke Department of Computer Science and Engineering Chalmers University, 41296 Göteborg, Sweden ptr@chalmers.se Abstract. For Cluster

More information

FuxiSort. Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc

FuxiSort. Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc Fuxi Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc {jiamang.wang, yongjun.wyj, hua.caihua, zhipeng.tzp, zhiqiang.lv,

More information

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or

More information

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides)

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Computing 2012 Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Algorithm Design Outline Computational Model Design Methodology Partitioning Communication

More information

Stanford University CS359G: Graph Partitioning and Expanders Handout 1 Luca Trevisan January 4, 2011

Stanford University CS359G: Graph Partitioning and Expanders Handout 1 Luca Trevisan January 4, 2011 Stanford University CS359G: Graph Partitioning and Expanders Handout 1 Luca Trevisan January 4, 2011 Lecture 1 In which we describe what this course is about. 1 Overview This class is about the following

More information

RESTORE: REUSING RESULTS OF MAPREDUCE JOBS. Presented by: Ahmed Elbagoury

RESTORE: REUSING RESULTS OF MAPREDUCE JOBS. Presented by: Ahmed Elbagoury RESTORE: REUSING RESULTS OF MAPREDUCE JOBS Presented by: Ahmed Elbagoury Outline Background & Motivation What is Restore? Types of Result Reuse System Architecture Experiments Conclusion Discussion Background

More information

An Approach for Integrating Basic Retiming and Software Pipelining

An Approach for Integrating Basic Retiming and Software Pipelining An Approach for Integrating Basic Retiming and Software Pipelining Noureddine Chabini Department of Electrical and Computer Engineering Royal Military College of Canada PB 7000 Station Forces Kingston

More information

Resource Allocation Strategies for In-Network Stream Processing

Resource Allocation Strategies for In-Network Stream Processing Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON-UCBL n o 5668 Resource Allocation Strategies for In-Network Stream Processing

More information

vsan Management Cluster First Published On: Last Updated On:

vsan Management Cluster First Published On: Last Updated On: First Published On: 07-07-2016 Last Updated On: 03-05-2018 1 1. vsan Management Cluster 1.1.Solution Overview Table Of Contents 2 1. vsan Management Cluster 3 1.1 Solution Overview HyperConverged Infrastructure

More information

Auto Source Code Generation and Run-Time Infrastructure and Environment for High Performance, Distributed Computing Systems

Auto Source Code Generation and Run-Time Infrastructure and Environment for High Performance, Distributed Computing Systems Auto Source Code Generation and Run-Time Infrastructure and Environment for High Performance, Distributed Computing Systems Minesh I. Patel Ph.D. 1, Karl Jordan 1, Mattew Clark Ph.D. 1, and Devesh Bhatt

More information

Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System

Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System Donald S. Miller Department of Computer Science and Engineering Arizona State University Tempe, AZ, USA Alan C.

More information

A Parallel Hardware Architecture for Information-Theoretic Adaptive Filtering

A Parallel Hardware Architecture for Information-Theoretic Adaptive Filtering A Parallel Hardware Architecture for Information-Theoretic Adaptive Filtering HPRCTA 2010 Stefan Craciun Dr. Alan D. George Dr. Herman Lam Dr. Jose C. Principe November 14, 2010 NSF CHREC Center ECE Department,

More information

Multi-resource Energy-efficient Routing in Cloud Data Centers with Network-as-a-Service

Multi-resource Energy-efficient Routing in Cloud Data Centers with Network-as-a-Service in Cloud Data Centers with Network-as-a-Service Lin Wang*, Antonio Fernández Antaº, Fa Zhang*, Jie Wu+, Zhiyong Liu* *Institute of Computing Technology, CAS, China ºIMDEA Networks Institute, Spain + Temple

More information

LINEAR CODES WITH NON-UNIFORM ERROR CORRECTION CAPABILITY

LINEAR CODES WITH NON-UNIFORM ERROR CORRECTION CAPABILITY LINEAR CODES WITH NON-UNIFORM ERROR CORRECTION CAPABILITY By Margaret Ann Bernard The University of the West Indies and Bhu Dev Sharma Xavier University of Louisiana, New Orleans ABSTRACT This paper introduces

More information

E-Companion: On Styles in Product Design: An Analysis of US. Design Patents

E-Companion: On Styles in Product Design: An Analysis of US. Design Patents E-Companion: On Styles in Product Design: An Analysis of US Design Patents 1 PART A: FORMALIZING THE DEFINITION OF STYLES A.1 Styles as categories of designs of similar form Our task involves categorizing

More information

Spatial Data Structures

Spatial Data Structures 15-462 Computer Graphics I Lecture 17 Spatial Data Structures Hierarchical Bounding Volumes Regular Grids Octrees BSP Trees Constructive Solid Geometry (CSG) April 1, 2003 [Angel 9.10] Frank Pfenning Carnegie

More information

Hierarchical Chubby: A Scalable, Distributed Locking Service

Hierarchical Chubby: A Scalable, Distributed Locking Service Hierarchical Chubby: A Scalable, Distributed Locking Service Zoë Bohn and Emma Dauterman Abstract We describe a scalable, hierarchical version of Google s locking service, Chubby, designed for use by systems

More information

Scheduling Algorithms to Minimize Session Delays

Scheduling Algorithms to Minimize Session Delays Scheduling Algorithms to Minimize Session Delays Nandita Dukkipati and David Gutierrez A Motivation I INTRODUCTION TCP flows constitute the majority of the traffic volume in the Internet today Most of

More information

Maximum Differential Graph Coloring

Maximum Differential Graph Coloring Maximum Differential Graph Coloring Sankar Veeramoni University of Arizona Joint work with Yifan Hu at AT&T Research and Stephen Kobourov at University of Arizona Motivation Map Coloring Problem Related

More information

CS599: Convex and Combinatorial Optimization Fall 2013 Lecture 14: Combinatorial Problems as Linear Programs I. Instructor: Shaddin Dughmi

CS599: Convex and Combinatorial Optimization Fall 2013 Lecture 14: Combinatorial Problems as Linear Programs I. Instructor: Shaddin Dughmi CS599: Convex and Combinatorial Optimization Fall 2013 Lecture 14: Combinatorial Problems as Linear Programs I Instructor: Shaddin Dughmi Announcements Posted solutions to HW1 Today: Combinatorial problems

More information

A Distributed Hash Table for Shared Memory

A Distributed Hash Table for Shared Memory A Distributed Hash Table for Shared Memory Wytse Oortwijn Formal Methods and Tools, University of Twente August 31, 2015 Wytse Oortwijn (Formal Methods and Tools, AUniversity Distributed of Twente) Hash

More information

6.896 Topics in Algorithmic Game Theory March 1, Lecture 8

6.896 Topics in Algorithmic Game Theory March 1, Lecture 8 6.896 Topics in Algorithmic Game Theory March 1, 2010 Lecture 8 Lecturer: Constantinos Daskalakis Scribe: Alan Deckelbaum, Anthony Kim NOTE: The content of these notes has not been formally reviewed by

More information

16/10/2008. Today s menu. PRAM Algorithms. What do you program? What do you expect from a model? Basic ideas for the machine model

16/10/2008. Today s menu. PRAM Algorithms. What do you program? What do you expect from a model? Basic ideas for the machine model Today s menu 1. What do you program? Parallel complexity and algorithms PRAM Algorithms 2. The PRAM Model Definition Metrics and notations Brent s principle A few simple algorithms & concepts» Parallel

More information

PHX: Memory Speed HPC I/O with NVM. Pradeep Fernando Sudarsun Kannan, Ada Gavrilovska, Karsten Schwan

PHX: Memory Speed HPC I/O with NVM. Pradeep Fernando Sudarsun Kannan, Ada Gavrilovska, Karsten Schwan PHX: Memory Speed HPC I/O with NVM Pradeep Fernando Sudarsun Kannan, Ada Gavrilovska, Karsten Schwan Node Local Persistent I/O? Node local checkpoint/ restart - Recover from transient failures ( node restart)

More information

Bayeux: An Architecture for Scalable and Fault Tolerant Wide area Data Dissemination

Bayeux: An Architecture for Scalable and Fault Tolerant Wide area Data Dissemination Bayeux: An Architecture for Scalable and Fault Tolerant Wide area Data Dissemination By Shelley Zhuang,Ben Zhao,Anthony Joseph, Randy Katz,John Kubiatowicz Introduction Multimedia Streaming typically involves

More information

Modalis. A First Step to the Evaluation of SimGrid in the Context of a Real Application. Abdou Guermouche and Hélène Renard, May 5, 2010

Modalis. A First Step to the Evaluation of SimGrid in the Context of a Real Application. Abdou Guermouche and Hélène Renard, May 5, 2010 A First Step to the Evaluation of SimGrid in the Context of a Real Application Abdou Guermouche and Hélène Renard, LaBRI/Univ Bordeaux 1 I3S/École polytechnique universitaire de Nice-Sophia Antipolis May

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google* 정학수, 최주영 1 Outline Introduction Design Overview System Interactions Master Operation Fault Tolerance and Diagnosis Conclusions

More information

Parallel Query Optimisation

Parallel Query Optimisation Parallel Query Optimisation Contents Objectives of parallel query optimisation Parallel query optimisation Two-Phase optimisation One-Phase optimisation Inter-operator parallelism oriented optimisation

More information

On Modularity Clustering. Group III (Ying Xuan, Swati Gambhir & Ravi Tiwari)

On Modularity Clustering. Group III (Ying Xuan, Swati Gambhir & Ravi Tiwari) On Modularity Clustering Presented by: Presented by: Group III (Ying Xuan, Swati Gambhir & Ravi Tiwari) Modularity A quality index for clustering a graph G=(V,E) G=(VE) q( C): EC ( ) EC ( ) + ECC (, ')

More information

ZHT: Const Eventual Consistency Support For ZHT. Group Member: Shukun Xie Ran Xin

ZHT: Const Eventual Consistency Support For ZHT. Group Member: Shukun Xie Ran Xin ZHT: Const Eventual Consistency Support For ZHT Group Member: Shukun Xie Ran Xin Outline Problem Description Project Overview Solution Maintains Replica List for Each Server Operation without Primary Server

More information

Spatial Data Structures

Spatial Data Structures 15-462 Computer Graphics I Lecture 17 Spatial Data Structures Hierarchical Bounding Volumes Regular Grids Octrees BSP Trees Constructive Solid Geometry (CSG) March 28, 2002 [Angel 8.9] Frank Pfenning Carnegie

More information