Complexity Results for Throughput and Latency Optimization of Replicated and Data-parallel Workflows
|
|
- Alyson Reed
- 5 years ago
- Views:
Transcription
1 Complexity Results for Throughput and Latency Optimization of Replicated and Data-parallel Workflows Anne Benoit and Yves Robert GRAAL team, LIP École Normale Supérieure de Lyon September 2007 September 2007 Complexity results for workflows HeteroPar 07 1/ 30
2 Introduction and motivation Mapping applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping Mapping skeleton workflows (pipeline, fork) onto heterogeneous platforms September 2007 Complexity results for workflows HeteroPar 07 2/ 30
3 Introduction and motivation Mapping applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping Mapping skeleton workflows (pipeline, fork) onto heterogeneous platforms September 2007 Complexity results for workflows HeteroPar 07 2/ 30
4 Introduction and motivation Mapping applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping Mapping skeleton workflows (pipeline, fork) onto heterogeneous platforms September 2007 Complexity results for workflows HeteroPar 07 2/ 30
5 Introduction and motivation Mapping applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping Mapping skeleton workflows (pipeline, fork) onto heterogeneous platforms September 2007 Complexity results for workflows HeteroPar 07 2/ 30
6 Introduction and motivation Mapping applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping Mapping skeleton workflows (pipeline, fork) onto heterogeneous platforms September 2007 Complexity results for workflows HeteroPar 07 2/ 30
7 Introduction and motivation Mapping applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping Mapping skeleton workflows (pipeline, fork) onto heterogeneous platforms September 2007 Complexity results for workflows HeteroPar 07 2/ 30
8 Introduction and motivation Mapping applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping Mapping skeleton workflows (pipeline, fork) onto heterogeneous platforms September 2007 Complexity results for workflows HeteroPar 07 2/ 30
9 Rule of the game Map each pipeline stage on a single processor (extended later: replication and data-parallelism) Goal: minimize execution time (extended later: throughput and latency) Several mapping strategies S 1 S 2... S k... S n The pipeline application Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 3/ 30
10 Rule of the game Map each pipeline stage on a single processor (extended later: replication and data-parallelism) Goal: minimize execution time (extended later: throughput and latency) Several mapping strategies S 1 S 2... S k... S n The pipeline application Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 3/ 30
11 Rule of the game Map each pipeline stage on a single processor (extended later: replication and data-parallelism) Goal: minimize execution time (extended later: throughput and latency) Several mapping strategies S 1 S 2... S k... S n One-to-one Mapping Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 3/ 30
12 Rule of the game Map each pipeline stage on a single processor (extended later: replication and data-parallelism) Goal: minimize execution time (extended later: throughput and latency) Several mapping strategies S 1 S 2... S k... S n Interval Mapping Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 3/ 30
13 Rule of the game Map each pipeline stage on a single processor (extended later: replication and data-parallelism) Goal: minimize execution time (extended later: throughput and latency) Several mapping strategies S 1 S 2... S k... S n General Mapping Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 3/ 30
14 Major contributions Theory Formal approach to the problem Definition of replication and data-parallelism (stages on several processors) Consider several optimization criteria Problem complexity for several cases Practice Wait for my next talk! September 2007 Complexity results for workflows HeteroPar 07 4/ 30
15 Major contributions Theory Formal approach to the problem Definition of replication and data-parallelism (stages on several processors) Consider several optimization criteria Problem complexity for several cases Practice Wait for my next talk! September 2007 Complexity results for workflows HeteroPar 07 4/ 30
16 Major contributions Theory Formal approach to the problem Definition of replication and data-parallelism (stages on several processors) Consider several optimization criteria Problem complexity for several cases Practice Wait for my next talk! September 2007 Complexity results for workflows HeteroPar 07 4/ 30
17 Outline 1 Framework 2 Working out an example 3 Complexity results 4 Conclusion Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 5/ 30
18 Outline 1 Framework 2 Working out an example 3 Complexity results 4 Conclusion Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 6/ 30
19 The application: pipeline graphs δ 0 δ 1 δ k 1 δ k δ n S 1 S 2 S k S n w 1 w 2 w k w n n stages S k, 1 k n S k : receives input of size δ k 1 from S k 1 performs w k computations outputs data of size δ k to S k+1 Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 7/ 30
20 The application: fork graphs δ 1 w 0 S 0 δ 0 δ 0 δ0 δ 0 w 1 w 2 S 2 w k S k w n S n S δ 1 δ 2 δ k δ n n + 1 stages S k, 0 k n S 0 : root stage S 1 to S n : independent stages A data-set goes through stage S 0, then it can be executed simultaneously for all other stages Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 8/ 30
21 The platform s in P in P out s out b in,u b v,out P u s u b u,v P v s v p processors P u, 1 u p, fully interconnected s u : speed of processor P u bidirectional link link u,v : P u P v, bandwidth b u,v one-port model: each processor can either send, receive or compute at any time-step Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 9/ 30
22 Different platforms NO COMMUNICATIONS Homogeneous Identical processors (s u = s): typical parallel machines Heterogeneous Different-speed processors (s u s v ), identical links since we do not consider communications (b u,v = b): networks of workstations, clusters Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 10/ 30
23 Different platforms NO COMMUNICATIONS Homogeneous Identical processors (s u = s): typical parallel machines Heterogeneous Different-speed processors (s u s v ), identical links since we do not consider communications (b u,v = b): networks of workstations, clusters Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 10/ 30
24 Rule of the game Consecutive data-sets fed into the workflow Period T period = time interval between beginning of execution of two consecutive data sets (throughput=1/t period ) Latency T latency (x) = time elapsed between beginning and end of execution for a given data set x, and T latency = max x T latency (x) Map each pipeline/fork stage on one or several processors Goal: minimize T period or T latency or bi-criteria minimization Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 11/ 30
25 Rule of the game Consecutive data-sets fed into the workflow Period T period = time interval between beginning of execution of two consecutive data sets (throughput=1/t period ) Latency T latency (x) = time elapsed between beginning and end of execution for a given data set x, and T latency = max x T latency (x) Map each pipeline/fork stage on one or several processors Goal: minimize T period or T latency or bi-criteria minimization Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 11/ 30
26 Stage types Monolithic stages: must be mapped on one single processor since computation for a data-set may depend on result of previous computation Replicable stages: can be replicated on several processors, but not parallel, i.e. a data-set must be entirely processed on a single processor Data-parallel stages: inherently parallel stages, one data-set can be computed in parallel by several processors Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 12/ 30
27 Replication Replicate stage S k on P 1,..., P q... S k 1 S k on P 2 : data sets 2, 5, 8,... S k on P 1 : data sets 1, 4, 7,... S k on P 3 : data sets 3, 5, 9,... S k+1... S k+1 may be monolithic: output order must be respected Round-robin rule to ensure output order Cannot feed more fast processors than slow ones Most efficient with similar-speed processors Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 13/ 30
28 Replication Replicate stage S k on P 1,..., P q... S k 1 S k on P 2 : data sets 2, 5, 8,... S k on P 1 : data sets 1, 4, 7,... S k on P 3 : data sets 3, 5, 9,... S k+1... S k+1 may be monolithic: output order must be respected Round-robin rule to ensure output order Cannot feed more fast processors than slow ones Most efficient with similar-speed processors Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 13/ 30
29 Data-parallelism Data-parallelize stage S k on P 1,..., P q S k (w = 16) P 1 (s 1 = 2) : P 2 (s 2 = 1) : P 3 (s 3 = 1) : Perfect sharing of the work Data-parallelize single stage only Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 14/ 30
30 Data-parallelism Data-parallelize stage S k on P 1,..., P q S k (w = 16) P 1 (s 1 = 2) : P 2 (s 2 = 1) : P 3 (s 3 = 1) : Perfect sharing of the work Data-parallelize single stage only Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 14/ 30
31 Interval Mapping for pipeline graphs Several consecutive stages onto the same processor Increase computational load, reduce communications Partition of [1..n] into m intervals I j = [d j, e j ] (with d j e j for 1 j m, d 1 = 1, d j+1 = e j + 1 for 1 j m 1 and e m = n) Interval I j mapped onto processor P alloc(j) T period = max 1 j m ej i=d j w i s alloc(j) T latency = 1 j m ej i=d j w i s alloc(j) Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 15/ 30
32 Interval Mapping for pipeline graphs Several consecutive stages onto the same processor Increase computational load, reduce communications Partition of [1..n] into m intervals I j = [d j, e j ] (with d j e j for 1 j m, d 1 = 1, d j+1 = e j + 1 for 1 j m 1 and e m = n) Interval I j mapped onto processor P alloc(j) T period = max 1 j m ej i=d j w i s alloc(j) T latency = 1 j m ej i=d j w i s alloc(j) Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 15/ 30
33 Interval Mapping for pipeline graphs Several consecutive stages onto the same processor Increase computational load, reduce communications Partition of [1..n] into m intervals I j = [d j, e j ] (with d j e j for 1 j m, d 1 = 1, d j+1 = e j + 1 for 1 j m 1 and e m = n) Interval I j mapped onto processor P alloc(j) T period = max 1 j m ej i=d j w i s alloc(j) T latency = 1 j m ej i=d j w i s alloc(j) Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 15/ 30
34 Replication and data-parallelism No data-parallelism overheads Cost to execute S i on P u alone: w i s u Cost to data-parallelize [S i, S j ] (i = j for pipeline; 0 < i j or i = j = 0 for fork) on k processors P q1,..., P qk : j l=i w l k u=1 s q u Cost = T period of assigned processors Cost = delay to traverse the interval Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 16/ 30
35 Replication and data-parallelism No data-parallelism overheads Cost to execute S i on P u alone: w i s u Cost to data-parallelize [S i, S j ] (i = j for pipeline; 0 < i j or i = j = 0 for fork) on k processors P q1,..., P qk : j l=i w l k u=1 s q u Cost = T period of assigned processors Cost = delay to traverse the interval Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 16/ 30
36 Replication and data-parallelism No data-parallelism overheads Cost to execute S i on P u alone: w i s u Cost to data-parallelize [S i, S j ] (i = j for pipeline; 0 < i j or i = j = 0 for fork) on k processors P q1,..., P qk : j l=i w l k u=1 s q u Cost = T period of assigned processors Cost = delay to traverse the interval Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 16/ 30
37 Replication and data-parallelism Cost to replicate [S i, S j ] on k processors P q1,..., P qk : j l=i w l k min 1 u k s qu. Cost = T period of assigned processors Delay to traverse the interval = time needed by slowest processor: j l=i t max = w l min 1 u k s qu With these formulas: easy to compute T period and T latency for pipeline graphs Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 17/ 30
38 Replication and data-parallelism Cost to replicate [S i, S j ] on k processors P q1,..., P qk : j l=i w l k min 1 u k s qu. Cost = T period of assigned processors Delay to traverse the interval = time needed by slowest processor: j l=i t max = w l min 1 u k s qu With these formulas: easy to compute T period and T latency for pipeline graphs Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 17/ 30
39 Outline 1 Framework 2 Working out an example 3 Complexity results 4 Conclusion Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 18/ 30
40 Working out an example S 1 S 2 S 3 S Interval mapping, 4 processors, s 1 = 2 and s 2 = s 3 = s 4 = 1 Optimal period? Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 19/ 30
41 Working out an example S 1 S 2 S 3 S Interval mapping, 4 processors, s 1 = 2 and s 2 = s 3 = s 4 = 1 Optimal period? T period = 7, S 1 P 1, S 2 S 3 P 2, S 4 P 3 (T latency = 17) Optimal latency? Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 19/ 30
42 Working out an example S 1 S 2 S 3 S Interval mapping, 4 processors, s 1 = 2 and s 2 = s 3 = s 4 = 1 Optimal period? T period = 7, S 1 P 1, S 2 S 3 P 2, S 4 P 3 (T latency = 17) Optimal latency? T latency = 12, S 1 S 2 S 3 S 4 P 1 (T period = 12) Min. latency if T period 10? Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 19/ 30
43 Working out an example S 1 S 2 S 3 S Interval mapping, 4 processors, s 1 = 2 and s 2 = s 3 = s 4 = 1 Optimal period? T period = 7, S 1 P 1, S 2 S 3 P 2, S 4 P 3 (T latency = 17) Optimal latency? T latency = 12, S 1 S 2 S 3 S 4 P 1 (T period = 12) Min. latency if T period 10? T latency = 14, S 1 S 2 S 3 P 1, S 4 P 2 Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 19/ 30
44 Example with replication and data-parallelism S 1 S 2 S 3 S Interval mapping, 4 processors, s 1 = 2 and s 2 = s 3 = s 4 = 1 Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 20/ 30
45 Example with replication and data-parallelism S 1 S 2 S 3 S Interval mapping, 4 processors, s 1 = 2 and s 2 = s 3 = s 4 = 1 Optimal period? Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 20/ 30
46 Example with replication and data-parallelism S 1 S 2 S 3 S Interval mapping, 4 processors, s 1 = 2 and s 2 = s 3 = s 4 = 1 Optimal period? S 1 DP P 1P 2, S 2 S 3 S 4 REP P 3P 4 T period = max( , ) = 5, T latency = Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 20/ 30
47 Example with replication and data-parallelism S 1 S 2 S 3 S Interval mapping, 4 processors, s 1 = 2 and s 2 = s 3 = s 4 = 1 Optimal period? S 1 DP P 1P 2, S 2 S 3 S 4 REP P 3P 4 T period = max( , ) = 5, T latency = DP S 1 P 2P 3 P 4, S 2 S 3 S 4 P 1 14 T period = max( 1+1+1, ) = 5, T latency = 9.67 (optimal) Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 20/ 30
48 Outline 1 Framework 2 Working out an example 3 Complexity results 4 Conclusion Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 21/ 30
49 Complexity results Pipeline and fork graphs No communications Homogeneous or Heterogeneous platforms Interval Mapping only Replicable stages, and either data-parallelism or not Bi-criteria optimization September 2007 Complexity results for workflows HeteroPar 07 22/ 30
50 Complexity results Without data-parallelism, Homogeneous platforms Objective period latency bi-criteria Hom. pipeline - Het. pipeline Poly (str) Hom. fork - Poly (DP) Het. fork Poly (str) NP-hard str = straightforward (map everything on the same proc...) DP = dynamic programming * = interesting case Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 23/ 30
51 Complexity results With data-parallelism, Homogeneous platforms Objective period latency bi-criteria Hom. pipeline - Het. pipeline Poly (DP) Hom. fork - Poly (DP) Het. fork Poly (str) NP-hard str = straightforward (map everything on the same proc...) DP = dynamic programming * = interesting case Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 23/ 30
52 Complexity results Without data-parallelism, Heterogeneous platforms Objective period latency bi-criteria Hom. pipeline Poly (*) - Poly (*) Het. pipeline NP-hard (**) Poly (str) NP-hard Hom. fork Poly (*) Het. fork NP-hard - str = straightforward (map everything on the same proc...) DP = dynamic programming * = interesting case Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 23/ 30
53 Complexity results With data-parallelism, Heterogeneous platforms Objective period latency bi-criteria Hom. pipeline NP-hard Het. pipeline - Hom. fork NP-hard Het. fork - str = straightforward (map everything on the same proc...) DP = dynamic programming * = interesting case Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 23/ 30
54 Complexity results Most interesting case: Without data-parallelism, Heterogeneous platforms Objective period latency bi-criteria Hom. pipeline Poly (*) - Poly (*) Het. pipeline NP-hard (**) Poly (str) NP-hard Hom. fork Poly (*) Het. fork NP-hard - Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 23/ 30
55 Complexity results Most interesting case: Without data-parallelism, Heterogeneous platforms Objective period latency bi-criteria Hom. pipeline Poly (*) - Poly (*) Het. pipeline NP-hard (**) Poly (str) NP-hard Hom. fork Poly (*) Het. fork NP-hard - Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 23/ 30
56 No data-parallelism, Heterogeneous platforms For pipeline, minimizing the latency is straightforward: map all stages on fastest proc Minimizing the period is NP-hard (involved reduction similar to the heterogeneous chain-to-chain one) for general pipeline Homogeneous pipeline: all stages have same workload w: in this case, polynomial complexity. Polynomial bi-criteria algorithm for homogeneous pipeline September 2007 Complexity results for workflows HeteroPar 07 24/ 30
57 No data-parallelism, Heterogeneous platforms For pipeline, minimizing the latency is straightforward: map all stages on fastest proc Minimizing the period is NP-hard (involved reduction similar to the heterogeneous chain-to-chain one) for general pipeline Homogeneous pipeline: all stages have same workload w: in this case, polynomial complexity. Polynomial bi-criteria algorithm for homogeneous pipeline September 2007 Complexity results for workflows HeteroPar 07 24/ 30
58 Lemma: form of the solution Pipeline, no data-parallelism, Heterogeneous platform Lemma If an optimal solution which minimizes pipeline period uses q processors, consider q fastest processors P 1,..., P q, ordered by non-decreasing speeds: s 1... s q. There exists an optimal solution which replicates intervals of stages onto k intervals of processors I r = [P dr, P er ], with 1 r k q, d 1 = 1, e k = q, and e r + 1 = d r+1 for 1 r < k. Proof: exchange argument, which does not increase latency Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 25/ 30
59 Lemma: form of the solution Pipeline, no data-parallelism, Heterogeneous platform Lemma If an optimal solution which minimizes pipeline period uses q processors, consider q fastest processors P 1,..., P q, ordered by non-decreasing speeds: s 1... s q. There exists an optimal solution which replicates intervals of stages onto k intervals of processors I r = [P dr, P er ], with 1 r k q, d 1 = 1, e k = q, and e r + 1 = d r+1 for 1 r < k. Proof: exchange argument, which does not increase latency Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 25/ 30
60 Binary-search/Dynamic programming algorithm Given latency L, given period K Loop on number of processors q Dynamic programming algorithm to minimize latency Success if L is obtained Binary search on L to minimize latency for fixed period Binary search on K to minimize period for fixed latency Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 26/ 30
61 Binary-search/Dynamic programming algorithm Given latency L, given period K Loop on number of processors q Dynamic programming algorithm to minimize latency Success if L is obtained Binary search on L to minimize latency for fixed period Binary search on K to minimize period for fixed latency Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 26/ 30
62 Dynamic programming algorithm Compute L(n, 1, q), where L(m, i, j) = minimum latency to map m pipeline stages on processors P i to P j, while fitting in period K. L(m, i, j) = min 1 m < m i k < j { m.w s i if m.w (j i).s i K (1) L(m, i, k) + L(m m, k + 1, j) (2) Case (1): replicating m stages onto processors P i,..., P j Case (2): splitting the interval Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 27/ 30
63 Dynamic programming algorithm Compute L(n, 1, q), where L(m, i, j) = minimum latency to map m pipeline stages on processors P i to P j, while fitting in period K. L(m, i, j) = Initialization: min 1 m < m i k < j L(1, i, j) = L(m, i, i) = { m.w s i if m.w (j i).s i K (1) L(m, i, k) + L(m m, k + 1, j) (2) { w w s i if (j i).s i K + otherwise { m.w s i m.w if s i + otherwise K Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 27/ 30
64 Dynamic programming algorithm Compute L(n, 1, q), where L(m, i, j) = minimum latency to map m pipeline stages on processors P i to P j, while fitting in period K. L(m, i, j) = min 1 m < m i k < j { m.w s i if m.w (j i).s i K (1) L(m, i, k) + L(m m, k + 1, j) (2) Complexity of the dynamic programming: O(n 2.p 4 ) Number of iterations of the binary search formally bounded, very small number of iterations in practice. Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 27/ 30
65 Outline 1 Framework 2 Working out an example 3 Complexity results 4 Conclusion Anne.Benoit@ens-lyon.fr September 2007 Complexity results for workflows HeteroPar 07 28/ 30
66 Conclusion Theoretical side Complexity results for several cases. Solid theoretical foundation for study of single/bi-criteria mappings, with possibility to replicate and data-parallelize application stages. Practical side Optimal polynomial algorithms. Some heuristics on particular cases (stay for next talk ). Future work Heuristics based on our polynomial algorithms for general application graphs structured as combinations of pipeline and fork kernels. Lots of open problems. September 2007 Complexity results for workflows HeteroPar 07 29/ 30
67 Related work Subhlok and Vondran Extension of their work (pipeline on hom platforms) Chains-to-chains In our work possibility to replicate or data-parallelize Mapping pipelined computations onto clusters and grids DAG [Taura et al.], DataCutter [Saltz et al.] Energy-aware mapping of pipelined computations [Melhem et al.], three-criteria optimization Mapping pipelined computations onto special-purpose architectures FPGA arrays [Fabiani et al.]. Fault-tolerance for embedded systems [Zhu et al.] Mapping skeletons onto clusters and grids Use of stochastic process algebra [Benoit et al.] September 2007 Complexity results for workflows HeteroPar 07 30/ 30
Complexity results for throughput and latency optimization of replicated and data-parallel workflows
Complexity results for throughput and latency optimization of replicated and data-parallel workflows Anne Benoit and Yves Robert GRAAL team, LIP École Normale Supérieure de Lyon June 2007 Anne.Benoit@ens-lyon.fr
More informationMapping pipeline skeletons onto heterogeneous platforms
Mapping pipeline skeletons onto heterogeneous platforms Anne Benoit and Yves Robert GRAAL team, LIP École Normale Supérieure de Lyon January 2007 Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons
More informationComplexity results for throughput and latency optimization of replicated and data-parallel workflows
Complexity results for throughput and latency optimization of replicated and data-parallel workflows Anne Benoit and Yves Robert LIP, ENS Lyon, 46 Allée d Italie, 69364 Lyon Cedex 07, France UMR 5668 -
More informationComplexity Results for Throughput and Latency Optimization of Replicated and Data-parallel Workflows
DOI 10.1007/s00453-008-9229-4 Complexity Results for Throughput and Latency Optimization of Replicated and Data-parallel Workflows Anne Benoit Yves Robert Received: 6 April 2007 / Accepted: 19 September
More informationMapping Linear Workflows with Computation/Communication Overlap
Mapping Linear Workflows with Computation/Communication Overlap 1 Kunal Agrawal 1, Anne Benoit,2 and Yves Robert 2 1 CSAIL, Massachusetts Institute of Technology, USA 2 LIP, École Normale Supérieure de
More informationOptimizing Latency and Reliability of Pipeline Workflow Applications
Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON-UCBL n o 5668 Optimizing Latency and Reliability of Pipeline Workflow Applications
More informationMulti-criteria scheduling of pipeline workflows
Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON-UCBL n o 5668 Multi-criteria scheduling of pipeline workflows Anne Benoit, Veronika
More informationScheduling Tasks Sharing Files from Distributed Repositories
from Distributed Repositories Arnaud Giersch 1, Yves Robert 2 and Frédéric Vivien 2 1 ICPS/LSIIT, University Louis Pasteur, Strasbourg, France 2 École normale supérieure de Lyon, France September 1, 2004
More informationMapping pipeline skeletons onto heterogeneous platforms
INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE Mapping pipeline skeletons onto heterogeneous platforms Anne Benoit Yves Robert N 6087 January 007 Thème NUM apport de recherche ISSN 049-6399
More informationScheduling on clusters and grids
Some basics on scheduling theory Grégory Mounié, Yves Robert et Denis Trystram ID-IMAG 6 mars 2006 Some basics on scheduling theory 1 Some basics on scheduling theory Notations and Definitions List scheduling
More informationScheduling tasks sharing files on heterogeneous master-slave platforms
Scheduling tasks sharing files on heterogeneous master-slave platforms Arnaud Giersch 1, Yves Robert 2, and Frédéric Vivien 2 1: ICPS/LSIIT, UMR CNRS ULP 7005, Strasbourg, France 2: LIP, UMR CNRS ENS Lyon
More informationOptimizing Latency and Reliability of Pipeline Workflow Applications
INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE arxiv:0711.1231v3 [cs.dc] 26 Mar 2008 Optimizing Latency and Reliability of Pipeline Workflow Applications Anne Benoit Veronika Rehn-Sonigo
More informationBi-criteria Pipeline Mappings for Parallel Image Processing
Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON-UCBL n o 5668 Bi-criteria Pipeline Mappings for Parallel Image Processing Anne
More informationReliability and performance optimization of pipelined real-time systems. Research Report LIP December 2009
Reliability and performance optimization of pipelined real-time systems Anne Benoit, Fanny Dufossé, Alain Girault, and Yves Robert ENS Lyon and INRIA Grenoble Rhône-Alpes, France {Firstname.Lastname}@inria.fr
More informationDatabase Architectures
B0B36DBS, BD6B36DBS: Database Systems h p://www.ksi.m.cuni.cz/~svoboda/courses/172-b0b36dbs/ Lecture 11 Database Architectures Authors: Tomáš Skopal, Irena Holubová Lecturer: Mar n Svoboda, mar n.svoboda@fel.cvut.cz
More informationMapping pipeline skeletons onto heterogeneous platforms
INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE Mapping pipeline skeletons onto heterogeneous platforms Anne Benoit Yves Robert N 6087 January 007 Thème NUM apport de recherche ISSN 049-6399
More informationScheduling tasks sharing files on heterogeneous master-slave platforms
Scheduling tasks sharing files on heterogeneous master-slave platforms Arnaud Giersch a,, Yves Robert b, Frédéric Vivien b a ICPS/LSIIT, UMR CNRS ULP 7005 Parc d Innovation, Bd Sébastien Brant, BP 10413,
More informationOVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI
CMPE 655- MULTIPLE PROCESSOR SYSTEMS OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI What is MULTI PROCESSING?? Multiprocessing is the coordinated processing
More informationMapping a Dynamic Prefix Tree on a P2P Network
Mapping a Dynamic Prefix Tree on a P2P Network Eddy Caron, Frédéric Desprez, Cédric Tedeschi GRAAL WG - October 26, 2006 Outline 1 Introduction 2 Related Work 3 DLPT architecture 4 Mapping 5 Conclusion
More informationSHARCNET Workshop on Parallel Computing. Hugh Merz Laurentian University May 2008
SHARCNET Workshop on Parallel Computing Hugh Merz Laurentian University May 2008 What is Parallel Computing? A computational method that utilizes multiple processing elements to solve a problem in tandem
More informationParallel algorithms at ENS Lyon
Parallel algorithms at ENS Lyon Yves Robert Ecole Normale Supérieure de Lyon & Institut Universitaire de France TCPP Workshop February 2010 Yves.Robert@ens-lyon.fr February 2010 Parallel algorithms 1/
More informationR-Storm: A Resource-Aware Scheduler for STORM. Mohammad Hosseini Boyang Peng Zhihao Hong Reza Farivar Roy Campbell
R-Storm: A Resource-Aware Scheduler for STORM Mohammad Hosseini Boyang Peng Zhihao Hong Reza Farivar Roy Campbell Introduction STORM is an open source distributed real-time data stream processing system
More informationThroughput-Optimal Broadcast in Wireless Networks with Point-to-Multipoint Transmissions
Throughput-Optimal Broadcast in Wireless Networks with Point-to-Multipoint Transmissions Abhishek Sinha Laboratory for Information and Decision Systems MIT MobiHoc, 2017 April 18, 2017 1 / 63 Introduction
More informationInterconnect Technology and Computational Speed
Interconnect Technology and Computational Speed From Chapter 1 of B. Wilkinson et al., PARAL- LEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers, augmented
More informationCentralized versus distributed schedulers for multiple bag-of-task applications
Centralized versus distributed schedulers for multiple bag-of-task applications O. Beaumont, L. Carter, J. Ferrante, A. Legrand, L. Marchal and Y. Robert Laboratoire LaBRI, CNRS Bordeaux, France Dept.
More informationEvaluating the Performance of Skeleton-Based High Level Parallel Programs
Evaluating the Performance of Skeleton-Based High Level Parallel Programs Anne Benoit, Murray Cole, Stephen Gilmore, and Jane Hillston School of Informatics, The University of Edinburgh, James Clerk Maxwell
More informationDESIGN AND OVERHEAD ANALYSIS OF WORKFLOWS IN GRID
I J D M S C L Volume 6, o. 1, January-June 2015 DESIG AD OVERHEAD AALYSIS OF WORKFLOWS I GRID S. JAMUA 1, K. REKHA 2, AD R. KAHAVEL 3 ABSRAC Grid workflow execution is approached as a pure best effort
More informationDistributed Computing: PVM, MPI, and MOSIX. Multiple Processor Systems. Dr. Shaaban. Judd E.N. Jenne
Distributed Computing: PVM, MPI, and MOSIX Multiple Processor Systems Dr. Shaaban Judd E.N. Jenne May 21, 1999 Abstract: Distributed computing is emerging as the preferred means of supporting parallel
More informationEnhancing Throughput of
Enhancing Throughput of NCA 2017 Zhongmiao Li, Peter Van Roy and Paolo Romano Enhancing Throughput of Partially Replicated State Machines via NCA 2017 Zhongmiao Li, Peter Van Roy and Paolo Romano Enhancing
More informationEnergy-aware Scheduling for Frame-based Tasks on Heterogeneous Multiprocessor Platforms
Energy-aware Scheduling for Frame-based Tasks on Heterogeneous Multiprocessor Platforms Dawei Li and Jie Wu Temple University, Philadelphia, PA, USA 2012-9-13 ICPP 2012 1 Outline 1. Introduction 2. System
More informationChapter 20: Database System Architectures
Chapter 20: Database System Architectures Chapter 20: Database System Architectures Centralized and Client-Server Systems Server System Architectures Parallel Systems Distributed Systems Network Types
More informationA Synchronization Algorithm for Distributed Systems
A Synchronization Algorithm for Distributed Systems Tai-Kuo Woo Department of Computer Science Jacksonville University Jacksonville, FL 32211 Kenneth Block Department of Computer and Information Science
More informationAdvanced Databases: Parallel Databases A.Poulovassilis
1 Advanced Databases: Parallel Databases A.Poulovassilis 1 Parallel Database Architectures Parallel database systems use parallel processing techniques to achieve faster DBMS performance and handle larger
More informationDistributed Systems. Overview. Distributed Systems September A distributed system is a piece of software that ensures that:
Distributed Systems Overview Distributed Systems September 2002 1 Distributed System: Definition A distributed system is a piece of software that ensures that: A collection of independent computers that
More informationStrategies for Replica Placement in Tree Networks
Strategies for Replica Placement in Tree Networks http://graal.ens-lyon.fr/~lmarchal/scheduling/ 2 avril 2009 Introduction and motivation Replica placement in tree networks Set of clients (tree leaves):
More informationHighway Dimension and Provably Efficient Shortest Paths Algorithms
Highway Dimension and Provably Efficient Shortest Paths Algorithms Andrew V. Goldberg Microsoft Research Silicon Valley www.research.microsoft.com/ goldberg/ Joint with Ittai Abraham, Amos Fiat, and Renato
More informationCentralized versus distributed schedulers for multiple bag-of-task applications
Centralized versus distributed schedulers for multiple bag-of-task applications O. Beaumont, L. Carter, J. Ferrante, A. Legrand, L. Marchal and Y. Robert Laboratoire LaBRI, CNRS Bordeaux, France Dept.
More informationLecture 23 Database System Architectures
CMSC 461, Database Management Systems Spring 2018 Lecture 23 Database System Architectures These slides are based on Database System Concepts 6 th edition book (whereas some quotes and figures are used
More informationSystems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15
Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2014/15 Lecture X: Parallel Databases Topics Motivation and Goals Architectures Data placement Query processing Load balancing
More informationParallel Databases C H A P T E R18. Practice Exercises
C H A P T E R18 Parallel Databases Practice Exercises 181 In a range selection on a range-partitioned attribute, it is possible that only one disk may need to be accessed Describe the benefits and drawbacks
More informationDesign and Analysis of Algorithms (VII)
Design and Analysis of Algorithms (VII) An Introduction to Randomized Algorithms Guoqiang Li School of Software, Shanghai Jiao Tong University Randomized Algorithms algorithms which employ a degree of
More informationChapter 18: Database System Architectures.! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems!
Chapter 18: Database System Architectures! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types 18.1 Centralized Systems! Run on a single computer system and
More informationA Parallel Hardware Architecture for Information-Theoretic Adaptive Filtering
A Parallel Hardware Architecture for Information-Theoretic Adaptive Filtering HPRCTA 2010 Stefan Craciun Dr. Alan D. George Dr. Herman Lam Dr. Jose C. Principe November 14, 2010 NSF CHREC Center ECE Department,
More informationGrid Scheduler. Grid Information Service. Local Resource Manager L l Resource Manager. Single CPU (Time Shared Allocation) (Space Shared Allocation)
Scheduling on the Grid 1 2 Grid Scheduling Architecture User Application Grid Scheduler Grid Information Service Local Resource Manager Local Resource Manager Local L l Resource Manager 2100 2100 2100
More informationFormal Verification of a Floating-Point Elementary Function
Introduction Coq & Flocq Coq.Interval Gappa Conclusion Formal Verification of a Floating-Point Elementary Function Inria Saclay Île-de-France & LRI, Université Paris Sud, CNRS 2015-06-25 Introduction Coq
More informationDatabase Architectures
Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 11/15/12 Agenda Check-in Centralized and Client-Server Models Parallelism Distributed Databases Homework 6 Check-in
More information16/10/2008. Today s menu. PRAM Algorithms. What do you program? What do you expect from a model? Basic ideas for the machine model
Today s menu 1. What do you program? Parallel complexity and algorithms PRAM Algorithms 2. The PRAM Model Definition Metrics and notations Brent s principle A few simple algorithms & concepts» Parallel
More informationPHX: Memory Speed HPC I/O with NVM. Pradeep Fernando Sudarsun Kannan, Ada Gavrilovska, Karsten Schwan
PHX: Memory Speed HPC I/O with NVM Pradeep Fernando Sudarsun Kannan, Ada Gavrilovska, Karsten Schwan Node Local Persistent I/O? Node local checkpoint/ restart - Recover from transient failures ( node restart)
More informationCPU Scheduling. CSE 2431: Introduction to Operating Systems Reading: Chapter 6, [OSC] (except Sections )
CPU Scheduling CSE 2431: Introduction to Operating Systems Reading: Chapter 6, [OSC] (except Sections 6.7.2 6.8) 1 Contents Why Scheduling? Basic Concepts of Scheduling Scheduling Criteria A Basic Scheduling
More informationLINEAR CODES WITH NON-UNIFORM ERROR CORRECTION CAPABILITY
LINEAR CODES WITH NON-UNIFORM ERROR CORRECTION CAPABILITY By Margaret Ann Bernard The University of the West Indies and Bhu Dev Sharma Xavier University of Louisiana, New Orleans ABSTRACT This paper introduces
More informationIntroduction to Parallel Computing
Introduction to Parallel Computing George Karypis Sorting Outline Background Sorting Networks Quicksort Bucket-Sort & Sample-Sort Background Input Specification Each processor has n/p elements A ordering
More informationSymmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment
Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment Xin-Wei Shih, Tzu-Hsuan Hsu, Hsu-Chieh Lee, Yao-Wen Chang, Kai-Yuan Chao 2013.01.24 1 Outline 2 Clock Network Synthesis Clock network
More informationCOLA: Optimizing Stream Processing Applications Via Graph Partitioning
COLA: Optimizing Stream Processing Applications Via Graph Partitioning Rohit Khandekar, Kirsten Hildrum, Sujay Parekh, Deepak Rajan, Joel Wolf, Kun-Lung Wu, Henrique Andrade, and Bugra Gedik Streaming
More informationDistributed Data Infrastructures, Fall 2017, Chapter 2. Jussi Kangasharju
Distributed Data Infrastructures, Fall 2017, Chapter 2 Jussi Kangasharju Chapter Outline Warehouse-scale computing overview Workloads and software infrastructure Failures and repairs Note: Term Warehouse-scale
More informationCSE 417 Branch & Bound (pt 4) Branch & Bound
CSE 417 Branch & Bound (pt 4) Branch & Bound Reminders > HW8 due today > HW9 will be posted tomorrow start early program will be slow, so debugging will be slow... Review of previous lectures > Complexity
More informationChapter 5: CPU Scheduling
Chapter 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Thread Scheduling Multiple-Processor Scheduling Operating Systems Examples Algorithm Evaluation Chapter 5: CPU Scheduling
More informationDistributed Systems LEEC (2006/07 2º Sem.)
Distributed Systems LEEC (2006/07 2º Sem.) Introduction João Paulo Carvalho Universidade Técnica de Lisboa / Instituto Superior Técnico Outline Definition of a Distributed System Goals Connecting Users
More informationOutline. CS38 Introduction to Algorithms. Approximation Algorithms. Optimization Problems. Set Cover. Set cover 5/29/2014. coping with intractibility
Outline CS38 Introduction to Algorithms Lecture 18 May 29, 2014 coping with intractibility approximation algorithms set cover TSP center selection randomness in algorithms May 29, 2014 CS38 Lecture 18
More informationHDFS Architecture. Gregory Kesden, CSE-291 (Storage Systems) Fall 2017
HDFS Architecture Gregory Kesden, CSE-291 (Storage Systems) Fall 2017 Based Upon: http://hadoop.apache.org/docs/r3.0.0-alpha1/hadoopproject-dist/hadoop-hdfs/hdfsdesign.html Assumptions At scale, hardware
More informationFoundations of Artificial Intelligence
Foundations of Artificial Intelligence 39. Automated Planning: Landmark Heuristics Malte Helmert and Gabriele Röger University of Basel May 10, 2017 Automated Planning: Overview Chapter overview: planning
More informationNew Results on Simple Stochastic Games
New Results on Simple Stochastic Games Decheng Dai 1 and Rong Ge 2 1 Tsinghua University, ddc02@mails.tsinghua.edu.cn 2 Princeton University, rongge@cs.princeton.edu Abstract. We study the problem of solving
More informationsflow: Towards Resource-Efficient and Agile Service Federation in Service Overlay Networks
sflow: Towards Resource-Efficient and Agile Service Federation in Service Overlay Networks Mea Wang, Baochun Li, Zongpeng Li Department of Electrical and Computer Engineering University of Toronto {mea,
More informationMapping a group of jobs in the error recovery of the Grid-based workflow within SLA context
Mapping a group of jobs in the error recovery of the Grid-based workflow within SLA context Dang Minh Quan International University in Germany School of Information Technology Bruchsal 76646, Germany quandm@upb.de
More informationPaxos Replicated State Machines as the Basis of a High- Performance Data Store
Paxos Replicated State Machines as the Basis of a High- Performance Data Store William J. Bolosky, Dexter Bradshaw, Randolph B. Haagens, Norbert P. Kusters and Peng Li March 30, 2011 Q: How to build a
More informationTrafficDB: HERE s High Performance Shared-Memory Data Store Ricardo Fernandes, Piotr Zaczkowski, Bernd Göttler, Conor Ettinoffe, and Anis Moussa
TrafficDB: HERE s High Performance Shared-Memory Data Store Ricardo Fernandes, Piotr Zaczkowski, Bernd Göttler, Conor Ettinoffe, and Anis Moussa EPL646: Advanced Topics in Databases Christos Hadjistyllis
More informationWho needs a scheduler?
Who needs a scheduler? Anne Benoit, Loris Marchal and Yves Robert École Normale Supérieure de Lyon, France {Anne.Benoit Loris.Marchal Yves.Robert}@ens-lyon.fr October 2008 LIP Research Report RR-2008-34
More informationParallelism. CS6787 Lecture 8 Fall 2017
Parallelism CS6787 Lecture 8 Fall 2017 So far We ve been talking about algorithms We ve been talking about ways to optimize their parameters But we haven t talked about the underlying hardware How does
More informationImportant separators and parameterized algorithms
Important separators and parameterized algorithms Dániel Marx 1 1 Institute for Computer Science and Control, Hungarian Academy of Sciences (MTA SZTAKI) Budapest, Hungary PCSS 2017 Vienna, Austria September
More informationCounting the Number of Eulerian Orientations
Counting the Number of Eulerian Orientations Zhenghui Wang March 16, 011 1 Introduction Consider an undirected Eulerian graph, a graph in which each vertex has even degree. An Eulerian orientation of the
More informationStaged Memory Scheduling
Staged Memory Scheduling Rachata Ausavarungnirun, Kevin Chang, Lavanya Subramanian, Gabriel H. Loh*, Onur Mutlu Carnegie Mellon University, *AMD Research June 12 th 2012 Executive Summary Observation:
More informationAn Approach for Integrating Basic Retiming and Software Pipelining
An Approach for Integrating Basic Retiming and Software Pipelining Noureddine Chabini Department of Electrical and Computer Engineering Royal Military College of Canada PB 7000 Station Forces Kingston
More informationClient Server & Distributed System. A Basic Introduction
Client Server & Distributed System A Basic Introduction 1 Client Server Architecture A network architecture in which each computer or process on the network is either a client or a server. Source: http://webopedia.lycos.com
More informationPYTHIA: Improving Datacenter Utilization via Precise Contention Prediction for Multiple Co-located Workloads
PYTHIA: Improving Datacenter Utilization via Precise Contention Prediction for Multiple Co-located Workloads Ran Xu (Purdue), Subrata Mitra (Adobe Research), Jason Rahman (Facebook), Peter Bai (Purdue),
More informationCHAPTER 6 ENERGY AWARE SCHEDULING ALGORITHMS IN CLOUD ENVIRONMENT
CHAPTER 6 ENERGY AWARE SCHEDULING ALGORITHMS IN CLOUD ENVIRONMENT This chapter discusses software based scheduling and testing. DVFS (Dynamic Voltage and Frequency Scaling) [42] based experiments have
More informationApplication Provisioning in Fog Computingenabled Internet-of-Things: A Network Perspective
Application Provisioning in Fog Computingenabled Internet-of-Things: A Network Perspective Ruozhou Yu, Guoliang Xue, and Xiang Zhang Arizona State University Outlines Background and Motivation System Modeling
More informationAdding Data Parallelism to Streaming Pipelines for Throughput Optimization
Adding Data Parallelism to Streaming Pipelines for Throughput Optimization Peng Li, Kunal Agrawal, Jeremy Buhler, Roger D. Chamberlain Department of Computer Science and Engineering Washington University
More informationCPSC 536N: Randomized Algorithms Term 2. Lecture 10
CPSC 536N: Randomized Algorithms 011-1 Term Prof. Nick Harvey Lecture 10 University of British Columbia In the first lecture we discussed the Max Cut problem, which is NP-complete, and we presented a very
More informationWrite a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical
Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or
More informationJoe Wingbermuehle, (A paper written under the guidance of Prof. Raj Jain)
1 of 11 5/4/2011 4:49 PM Joe Wingbermuehle, wingbej@wustl.edu (A paper written under the guidance of Prof. Raj Jain) Download The Auto-Pipe system allows one to evaluate various resource mappings and topologies
More informationMapping Heuristics in Heterogeneous Computing
Mapping Heuristics in Heterogeneous Computing Alexandru Samachisa Dmitriy Bekker Multiple Processor Systems (EECC756) May 18, 2006 Dr. Shaaban Overview Introduction Mapping overview Homogenous computing
More informationMinimization of NBTI Performance Degradation Using Internal Node Control
Minimization of NBTI Performance Degradation Using Internal Node Control David R. Bild, Gregory E. Bok, and Robert P. Dick Department of EECS Nico Trading University of Michigan 3 S. Wacker Drive, Suite
More informationIOS: A Middleware for Decentralized Distributed Computing
IOS: A Middleware for Decentralized Distributed Computing Boleslaw Szymanski Kaoutar El Maghraoui, Carlos Varela Department of Computer Science Rensselaer Polytechnic Institute http://www.cs.rpi.edu/wwc
More informationCourse Content. Parallel & Distributed Databases. Objectives of Lecture 12 Parallel and Distributed Databases
Database Management Systems Winter 2003 CMPUT 391: Dr. Osmar R. Zaïane University of Alberta Chapter 22 of Textbook Course Content Introduction Database Design Theory Query Processing and Optimisation
More informationMultimedia Storage Servers
Multimedia Storage Servers Cyrus Shahabi shahabi@usc.edu Computer Science Department University of Southern California Los Angeles CA, 90089-0781 http://infolab.usc.edu 1 OUTLINE Introduction Continuous
More informationVarious Strategies of Load Balancing Techniques and Challenges in Distributed Systems
Various Strategies of Load Balancing Techniques and Challenges in Distributed Systems Abhijit A. Rajguru Research Scholar at WIT, Solapur Maharashtra (INDIA) Dr. Mrs. Sulabha. S. Apte WIT, Solapur Maharashtra
More informationEfficient Load Balancing and Fault tolerance Mechanism for Cloud Environment
Efficient Load Balancing and Fault tolerance Mechanism for Cloud Environment Pooja Kathalkar 1, A. V. Deorankar 2 1 Department of Computer Science and Engineering, Government College of Engineering Amravati
More informationSpatial Data Structures
15-462 Computer Graphics I Lecture 17 Spatial Data Structures Hierarchical Bounding Volumes Regular Grids Octrees BSP Trees Constructive Solid Geometry (CSG) April 1, 2003 [Angel 9.10] Frank Pfenning Carnegie
More informationZHT: Const Eventual Consistency Support For ZHT. Group Member: Shukun Xie Ran Xin
ZHT: Const Eventual Consistency Support For ZHT Group Member: Shukun Xie Ran Xin Outline Problem Description Project Overview Solution Maintains Replica List for Each Server Operation without Primary Server
More informationA Distributed Hash Table for Shared Memory
A Distributed Hash Table for Shared Memory Wytse Oortwijn Formal Methods and Tools, University of Twente August 31, 2015 Wytse Oortwijn (Formal Methods and Tools, AUniversity Distributed of Twente) Hash
More informationStanford University CS359G: Graph Partitioning and Expanders Handout 1 Luca Trevisan January 4, 2011
Stanford University CS359G: Graph Partitioning and Expanders Handout 1 Luca Trevisan January 4, 2011 Lecture 1 In which we describe what this course is about. 1 Overview This class is about the following
More informationEnergy-Aware Scheduling for Acyclic Synchronous Data Flows on Multiprocessors
Journal of Interconnection Networks c World Scientific Publishing Company Energy-Aware Scheduling for Acyclic Synchronous Data Flows on Multiprocessors DAWEI LI and JIE WU Department of Computer and Information
More informationdata parallelism Chris Olston Yahoo! Research
data parallelism Chris Olston Yahoo! Research set-oriented computation data management operations tend to be set-oriented, e.g.: apply f() to each member of a set compute intersection of two sets easy
More informationScalability of Heterogeneous Computing
Scalability of Heterogeneous Computing Xian-He Sun, Yong Chen, Ming u Department of Computer Science Illinois Institute of Technology {sun, chenyon1, wuming}@iit.edu Abstract Scalability is a key factor
More informationApproximation Algorithms
Chapter 8 Approximation Algorithms Algorithm Theory WS 2016/17 Fabian Kuhn Approximation Algorithms Optimization appears everywhere in computer science We have seen many examples, e.g.: scheduling jobs
More informationMaximizing Restorable Throughput in MPLS Networks
Maximizing Restorable Throughput in MPLS Networks Reuven Cohen Dept. of Computer Science, Technion Gabi Nakibly National EW Research Center 1 Motivation IP networks are required to service real-time applications
More informationDynamics, Non-Cooperation, and Other Algorithmic Challenges in Peer-to-Peer Computing
Dynamics, Non-Cooperation, and Other Algorithmic Challenges in Peer-to-Peer Computing Stefan Schmid Distributed Computing Group Oberseminar TU München, Germany December 2007 Networks Neuron Networks DISTRIBUTED
More informationParallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce
Parallel Programming Principle and Practice Lecture 10 Big Data Processing with MapReduce Outline MapReduce Programming Model MapReduce Examples Hadoop 2 Incredible Things That Happen Every Minute On The
More informationEnergy-aware mappings of series-parallel workflows onto chip multiprocessors
INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE Energy-aware mappings of series-parallel workflows onto chip multiprocessors Anne Benoit Rami Melhem Paul Renaud-Goud Yves Robert N 7521
More informationThe Round Complexity of Distributed Sorting
The Round Complexity of Distributed Sorting by Boaz Patt-Shamir Marat Teplitsky Tel Aviv University 1 Motivation Distributed sorting Infrastructure is more and more distributed Cloud Smartphones 2 CONGEST
More informationPLB-HeC: A Profile-based Load-Balancing Algorithm for Heterogeneous CPU-GPU Clusters
PLB-HeC: A Profile-based Load-Balancing Algorithm for Heterogeneous CPU-GPU Clusters IEEE CLUSTER 2015 Chicago, IL, USA Luis Sant Ana 1, Daniel Cordeiro 2, Raphael Camargo 1 1 Federal University of ABC,
More information