Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks

Size: px

Start display at page:

Download "Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks"

Bertha Sparks
6 years ago
Views:

1 Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks Course: CS655 Rabiet Louis Colorado State University Thursday 19 September / 48

2 1 Motivation and Goal Why use Dryad? 2 Dryad Graph Example Properties 3 Some others Ants Pegasus 4 An improvement: Ciel Dynamic tasks 5 Comparison 6 Conclusion 2 / 48

3 Plan 1 Motivation and Goal Why use Dryad? 2 Dryad Graph Example Properties 3 Some others Ants Pegasus 4 An improvement: Ciel Dynamic tasks 5 Comparison 6 Conclusion 3 / 48

4 Why use Dryad? Parallel Programming Goal Make parallel programs easier for developers. 4 / 48

5 Why use Dryad? Parallel Programming Goal Make parallel programs easier for developers. Current Choices Google s MapReduce is a popular framework for developing cloud-scale applications. 4 / 48

6 Why use Dryad? Parallel Programming Goal Make parallel programs easier for developers. Current Choices Google s MapReduce is a popular framework for developing cloud-scale applications. GPU shader languages. 4 / 48

7 Why use Dryad? Parallel Programming Goal Make parallel programs easier for developers. Current Choices Google s MapReduce is a popular framework for developing cloud-scale applications. GPU shader languages. Parallel databases. 4 / 48

8 Why use Dryad? Solution and Results How esle can we achieve that? Using graphs for describing the scheduling/ relation between jobs. 5 / 48

9 Why use Dryad? Solution and Results How esle can we achieve that? Using graphs for describing the scheduling/ relation between jobs. The system handles the rest i.e. managing execution of stages that compose the graph. 5 / 48

10 Why use Dryad? Solution and Results How esle can we achieve that? Using graphs for describing the scheduling/ relation between jobs. The system handles the rest i.e. managing execution of stages that compose the graph. Results Can be performance efficient and simpler. 5 / 48

11 Plan 1 Motivation and Goal Why use Dryad? 2 Dryad Graph Example Properties 3 Some others Ants Pegasus 4 An improvement: Ciel Dynamic tasks 5 Comparison 6 Conclusion 6 / 48

12 System overview 7 / 48

13 Graph Dryad program is a graph Several operators over them can be specified 8 / 48

14 Graph Dryad program is a graph Several operators over them can be specified Adding and cloning vertices 8 / 48

15 Graph Dryad program is a graph Several operators over them can be specified Adding and cloning vertices 8 / 48

16 Graph Dryad program is a graph Several operators over them can be specified Adding and cloning vertices Adding edges 8 / 48

17 Graph Dryad program is a graph Several operators over them can be specified Adding and cloning vertices Adding edges 8 / 48

18 Graph Dryad program is a graph Several operators over them can be specified Adding and cloning vertices Adding edges Merging two graphs 8 / 48

19 Graph Dryad program is a graph Several operators over them can be specified Adding and cloning vertices Adding edges Merging two graphs 8 / 48

20 Graph Merge 9 / 48

21 Graph Merge 9 / 48

22 Graph Merge Merge with bypass 9 / 48

23 Graph Merge Merge with bypass 9 / 48

24 Graph Merge Merge with bypass Asymmetric join 9 / 48

25 Graph Merge Merge with bypass Asymmetric join 9 / 48

26 Example Code example select distinct p.objid from photoobjall p join neighbors n -- call this join \X" on p.objid = n.objid and n.objid < n.neighborobjid and p.mode = 1 join photoobjall l -- call this join \Y" on l.objid = n.neighborobjid and l.mode = 1 and abs((p.u-p.g)-(l.u-l.g))<0.05 and abs((p.g-p.r)-(l.g-l.r))<0.05 and abs((p.r-p.i)-(l.r-l.i))<0.05 and abs((p.i-p.z)-(l.i-l.z))< / 48

27 Example Corresponding graph 11 / 48

28 Example Corresponding vertex program G r a p h B u i l d e r XSet = modulexˆn; G r a p h B u i l d e r DSet = moduledˆn; G r a p h B u i l d e r MSet = modulem ˆ(N 4) ; GraphBuilder SSet = modules ˆ(N 4) ; G r a p h B u i l d e r YSet = moduleyˆn; GraphBuilder HSet = moduleh ˆ1; GraphBuilder XInputs = ( ugriz1 >= XSet ) ( neighbor >= XSet ) ; G r a p h B u i l d e r Y I n p u t s = u g r i z 2 >= YSet ; G r a p h B u i l d e r XToY = XSet >= DSet >> MSet >= SSet ; f o r ( i = 0 ; i < N 4; ++i ) { XToY = XToY ( SSet. G e t V e r t e x ( i ) >= YSet. G e t V e r t e x ( i /4) ) ; } G r a p h B u i l d e r YToH = YSet >= HSet ; GraphBuilder HOutputs = HSet >= output ; GraphBuilder f i n a l = XInputs YInputs XToY YToH HOutputs ; 12 / 48

29 Properties Some nice properties... Channels Communication can be chosen between Files TCP Pipe Shared-Memory FIFO 13 / 48

30 Properties Some nice properties... Channels Communication can be chosen between Files TCP Pipe Shared-Memory FIFO Support for legacy applications Node/vertices can execute old code not specifically written for Dryad. 13 / 48

31 Properties Some nice properties Runtime aggregation 14 / 48

32 Properties Some nice properties Runtime aggregation Fault tolerance Restart failed vertex. 14 / 48

33 Properties Scalability / 48

34 Properties Scalability 16 / 48

35 Properties SQL 17 / 48

36 Plan 1 Motivation and Goal Why use Dryad? 2 Dryad Graph Example Properties 3 Some others Ants Pegasus 4 An improvement: Ciel Dynamic tasks 5 Comparison 6 Conclusion 18 / 48

37 Ants Some limitations of Dryad 19 / 48

38 Ants Some limitations of Dryad Fault tolerance: what happens if a stage manager fails. 19 / 48

39 Ants Some limitations of Dryad Fault tolerance: what happens if a stage manager fails. No way of defining dynamic graphs. 19 / 48

40 Ants Linking ACO and Dryad 20 / 48

41 Ants Linking ACO and Dryad 20 / 48

42 Ants Linking ACO and Dryad Finding a path between producers and consumers 20 / 48

43 Ants Linking ACO and Dryad Finding a path between producers and consumers Is finding a path sufficient? 20 / 48

44 Ants Linking ACO and Dryad Finding a path between producers and consumers Is finding a path sufficient? What happens if a node is already close to overutilization? 20 / 48

45 Ants Linking ACO and Dryad Finding a path between producers and consumers Is finding a path sufficient? What happens if a node is already close to overutilization? 1 Need a way of evaluating the quality of the path. 20 / 48

46 Ants Linking ACO and Dryad Finding a path between producers and consumers Is finding a path sufficient? What happens if a node is already close to overutilization? 1 Need a way of evaluating the quality of the path. 2 Avoid interference between two tasks. 20 / 48

47 Ants Abstraction of the problem 21 / 48

48 Ants Abstraction of the problem Each cell is a server 21 / 48

49 Ants Abstraction of the problem Each cell is a server In a cell C, we have N(C), the neighbours Φ C D,n: probability if the ant has the destination D that the ant is going to the cell n when currently on C. Routing table: mean and variance of the time to go from C to D. 21 / 48

50 Ants Abstraction of the problem Each cell is a server In a cell C, we have N(C), the neighbours Φ C D,n: probability if the ant has the destination D that the ant is going to the cell n when currently on C. Routing table: mean and variance of the time to go from C to D. 21 / 48

51 Ants First type of ant: routing ants An ant records its path, knows the final destination D and the source C s. At each step, when an ant is on a cell C (different from D). 22 / 48

52 Ants First type of ant: routing ants An ant records its path, knows the final destination D and the source C s. At each step, when an ant is on a cell C (different from D). 1 Record queue waiting time at C. 22 / 48

53 Ants First type of ant: routing ants An ant records its path, knows the final destination D and the source C s. At each step, when an ant is on a cell C (different from D). 1 Record queue waiting time at C. 2 Choose next hop (unknown neighbour first) using Φ C D,n. [Detection of cycles] 22 / 48

54 Ants First type of ant: routing ants An ant records its path, knows the final destination D and the source C s. At each step, when an ant is on a cell C (different from D). 1 Record queue waiting time at C. 2 Choose next hop (unknown neighbour first) using Φ C D,n. [Detection of cycles] 3 En-queue on the link to go to n. 22 / 48

55 Ants First type of ant: routing ants An ant records its path, knows the final destination D and the source C s. At each step, when an ant is on a cell C (different from D). 1 Record queue waiting time at C. 2 Choose next hop (unknown neighbour first) using Φ C D,n. [Detection of cycles] 3 En-queue on the link to go to n. 4 Record waiting time and transmission time. 22 / 48

56 Ants First type of ant: routing ants An ant records its path, knows the final destination D and the source C s. At each step, when an ant is on a cell C (different from D). 1 Record queue waiting time at C. 2 Choose next hop (unknown neighbour first) using Φ C D,n. [Detection of cycles] 3 En-queue on the link to go to n. 4 Record waiting time and transmission time. 5 If the ant has not arrived at D, insert itself in the queue, repeat / 48

57 Ants First type of ant: routing ants An ant records its path, knows the final destination D and the source C s. At each step, when an ant is on a cell C (different from D). 1 Record queue waiting time at C. 2 Choose next hop (unknown neighbour first) using Φ C D,n. [Detection of cycles] 3 En-queue on the link to go to n. 4 Record waiting time and transmission time. 5 If the ant has not arrived at D, insert itself in the queue, repeat If the ant arrived at D, start to go back to C s 22 / 48

58 Ants Second type: scouting ants Try to find alternative path to go from C s to D. 23 / 48

59 Ants Second type: scouting ants Try to find alternative path to go from C s to D. If the computing time of the new tasks L new and the computing time of the old tasks L prev does not surpass a threshold L T (user defined). 23 / 48

60 Ants Second type: scouting ants Try to find alternative path to go from C s to D. If the computing time of the new tasks L new and the computing time of the old tasks L prev does not surpass a threshold L T (user defined). if L T is not defined, the ant just ensures that the load on the server is inferior to α (random variable). g is an other random variable to determine the number of tasks the ant will place on the server. 23 / 48

61 Ants Second type: scouting ants Try to find alternative path to go from C s to D. If the computing time of the new tasks L new and the computing time of the old tasks L prev does not surpass a threshold L T (user defined). if L T is not defined, the ant just ensures that the load on the server is inferior to α (random variable). g is an other random variable to determine the number of tasks the ant will place on the server. The system generate scout ants with various α and g. If some ants are not able to place all the tasks and have reached D then they will crawl back and modify the Φ C D,n (Negative Reinforcement). 23 / 48

62 Ants Third type: enforcement ants 24 / 48

63 Ants Third type: enforcement ants When enough scout ants are reporting success, some enforcement ants are released. 24 / 48

64 Ants Third type: enforcement ants When enough scout ants are reporting success, some enforcement ants are released. Except if extraordinary results are shown by scouts reports, the number of successful scout ants need to be higher than a threshold. 24 / 48

65 Ants Third type: enforcement ants When enough scout ants are reporting success, some enforcement ants are released. Except if extraordinary results are shown by scouts reports, the number of successful scout ants need to be higher than a threshold. The enforcement ants recompute the cost (and compare it with the scout results); if everything is fine the placement is done. 24 / 48

66 Ants Some drawbacks 25 / 48

67 Ants Some drawbacks Evaluation is done via simulation 25 / 48

68 Ants Some drawbacks Evaluation is done via simulation No theoretical proof of correctness for placement algorithm 25 / 48

69 Ants Some drawbacks Evaluation is done via simulation No theoretical proof of correctness for placement algorithm Very few optimizations what about overhead costs? 25 / 48

70 Pegasus 26 / 48

71 Pegasus 27 / 48

72 Pegasus Scheduling horizon 28 / 48

73 Plan 1 Motivation and Goal Why use Dryad? 2 Dryad Graph Example Properties 3 Some others Ants Pegasus 4 An improvement: Ciel Dynamic tasks 5 Comparison 6 Conclusion 29 / 48

74 Dynamic tasks Control Flow 30 / 48

75 Dynamic tasks Publishing: how to handle dynamic graphs 31 / 48

76 Dynamic tasks Lazy evaluation Lazy evaluation is used to schedule task. 32 / 48

77 Dynamic tasks Lazy evaluation Lazy evaluation is used to schedule task. 1 Start by the (expected) output of the root task 32 / 48

78 Dynamic tasks Lazy evaluation Lazy evaluation is used to schedule task. 1 Start by the (expected) output of the root task 2 If T has only concrete dependencies, execute it immediately 32 / 48

79 Dynamic tasks Lazy evaluation Lazy evaluation is used to schedule task. 1 Start by the (expected) output of the root task 2 If T has only concrete dependencies, execute it immediately 3 Else block T until the input becomes concrete. 32 / 48

80 Dynamic tasks An optimization result = spawn_exec(executor, args, n); Unique Naming for the i th output: name = executor:sha1(arg n):i Memoisation If a new task s outputs have already been produced by a previous task, the new task need not be executed at all. 33 / 48

81 Master Ciel support master failure 34 / 48

82 Persistent logging Write when a new job is launched 35 / 48

83 Persistent logging Write when a new job is launched The descriptors (graphs) are periodically written to the disk. 35 / 48

84 Persistent logging Write when a new job is launched The descriptors (graphs) are periodically written to the disk. If the master reboots, it looks for unmatched jobs. They are then restarted (the graph has been stored). 35 / 48

85 Secondary masters 36 / 48

86 Secondary masters Everything is sent to every master (primary and secondaries). 36 / 48

87 Secondary masters Everything is sent to every master (primary and secondaries). Every secondary records workers address. 36 / 48

88 Object table reconstruction In a case of a master rebooting (failure then recovering): 37 / 48

89 Object table reconstruction In a case of a master rebooting (failure then recovering): 1 Workers detect master failure (heartbeat). 37 / 48

90 Object table reconstruction In a case of a master rebooting (failure then recovering): 1 Workers detect master failure (heartbeat). 2 If a failure is detected, workers send registration to the master. 37 / 48

91 Object table reconstruction In a case of a master rebooting (failure then recovering): 1 Workers detect master failure (heartbeat). 2 If a failure is detected, workers send registration to the master. 3 Master resends objects list. 37 / 48

92 Speed comparison 38 / 48

93 Plan 1 Motivation and Goal Why use Dryad? 2 Dryad Graph Example Properties 3 Some others Ants Pegasus 4 An improvement: Ciel Dynamic tasks 5 Comparison 6 Conclusion 39 / 48

94 Map Reduce 40 / 48

95 Map Reduce Simple Efficient for a specific kind of programs: web indexing 40 / 48

96 Map Reduce Simple Efficient for a specific kind of programs: web indexing Reliability very simple 40 / 48

97 Map Reduce Simple Efficient for a specific kind of programs: web indexing Reliability very simple Workflow hardcoded 40 / 48

98 Map Reduce Simple Efficient for a specific kind of programs: web indexing Reliability very simple Workflow hardcoded New? 40 / 48

99 Map Reduce Simple Efficient for a specific kind of programs: web indexing Reliability very simple Workflow hardcoded New? PL-SQL, Teradata. 40 / 48

100 Map Reduce Simple Efficient for a specific kind of programs: web indexing Reliability very simple Workflow hardcoded New? PL-SQL, Teradata. Improvement Instead of having a Map phase then a Reduce phase: do a combination of Map-Reduce. 40 / 48

101 Map Reduce Simple Efficient for a specific kind of programs: web indexing Reliability very simple Workflow hardcoded New? PL-SQL, Teradata. Improvement Instead of having a Map phase then a Reduce phase: do a combination of Map-Reduce. Computation Reuse! 40 / 48

102 Dryad 41 / 48

103 Dryad Some optimizations: aggregation (data locality). 41 / 48

104 Dryad Some optimizations: aggregation (data locality). Much more general than Map Reduce 41 / 48

105 Dryad Some optimizations: aggregation (data locality). Much more general than Map Reduce Not so simple (look at the code) 41 / 48

106 Dryad Some optimizations: aggregation (data locality). Much more general than Map Reduce Not so simple (look at the code) No cycles 41 / 48

107 Dryad Some optimizations: aggregation (data locality). Much more general than Map Reduce Not so simple (look at the code) No cycles Coming from SQL. 41 / 48

108 Dryad Some optimizations: aggregation (data locality). Much more general than Map Reduce Not so simple (look at the code) No cycles Coming from SQL. Improvement Ciel 41 / 48

109 Ants 42 / 48

110 Ants Analogy with nature. 42 / 48

111 Ants Analogy with nature. Distributed 42 / 48

112 Ants Analogy with nature. Distributed General 42 / 48

113 Ants Analogy with nature. Distributed General Account for machine load 42 / 48

114 Ants Analogy with nature. Distributed General Account for machine load Difficult to predict behaviour 42 / 48

115 Ants Analogy with nature. Distributed General Account for machine load Difficult to predict behaviour Overhead!! 42 / 48

116 Ants Analogy with nature. Distributed General Account for machine load Difficult to predict behaviour Overhead!! 42 / 48

117 Ants Analogy with nature. Distributed General Account for machine load Difficult to predict behaviour Overhead!! Improvement 42 / 48

118 Ants Analogy with nature. Distributed General Account for machine load Difficult to predict behaviour Overhead!! Improvement What happens if an ant is faulty? 42 / 48

119 Ants Analogy with nature. Distributed General Account for machine load Difficult to predict behaviour Overhead!! Improvement What happens if an ant is faulty? I would replace it with a simpler search of optimal paths. 42 / 48

120 Pegasus 43 / 48

121 Pegasus Performance issue 43 / 48

122 Pegasus Performance issue Not general enough 43 / 48

123 Pegasus Performance issue Not general enough 43 / 48

124 Pegasus Performance issue Not general enough Improvement Reliability (only partition level) 43 / 48

125 Ciel 44 / 48

126 Ciel Much more general. 44 / 48

127 Ciel Much more general. Dynamic task dependencies 44 / 48

128 Ciel Much more general. Dynamic task dependencies 44 / 48

129 Ciel Much more general. Dynamic task dependencies Improvement Random funtions 44 / 48

130 Ciel Much more general. Dynamic task dependencies Improvement Random funtions How far are you from machine performance? 44 / 48

131 Perpectives 45 / 48

132 Perpectives Adding user s transformations for graphs: tiling, merge, etc / 48

133 Perpectives Adding user s transformations for graphs: tiling, merge, etc... BGP (distance-vector routing protocol) 45 / 48

134 Perpectives Adding user s transformations for graphs: tiling, merge, etc... BGP (distance-vector routing protocol) Look at this problem using maximum flow problem. 45 / 48

135 Plan 1 Motivation and Goal Why use Dryad? 2 Dryad Graph Example Properties 3 Some others Ants Pegasus 4 An improvement: Ciel Dynamic tasks 5 Comparison 6 Conclusion 46 / 48

136 Conclusion Microsoft decided to stop developing Dryad for commercial reasons. 47 / 48

137 Conclusion Summary Several restrictions that make the result possible: Node results are deterministic Job manager doesn t fail Acyclic graph Doesn t add enough to MapReduce 48 / 48

Review: Dryad. Louis Rabiet. September 20, 2013

Review: Dryad. Louis Rabiet. September 20, 2013 Review: Dryad Louis Rabiet September 20, 2013 Who is the intended audi- What problem did the paper address? ence? The paper is proposing to solve the problem of being able to take advantages of distributed