A Fast Recursive Mapping Algorithm. Department of Computer and Information Science. New Jersey Institute of Technology.

Size: px
Start display at page:

Download "A Fast Recursive Mapping Algorithm. Department of Computer and Information Science. New Jersey Institute of Technology."

Transcription

1 A Fast Recursive Mapping Algorithm Song Chen and Mary M. Eshaghian Department of Computer and Information Science New Jersey Institute of Technology Newark, NJ 7 Abstract This paper presents a generic technique for mapping parallel algorithms parallel architectures. The proposed technique is a fast recursive mapping algorithm which is a component of the Cluster-M programming tool. The other components of Cluster-M are the Specication module and the Representation module. In the Specication module, for a given task specied by a high-level machine-indepent program, a clustered task graph called Spec graph is generated. In the Representation module, for a given architecture or computing organization, a clustered system graph called Rep graph is generated. Given a task (or system) graph, a Spec (or Rep) graph can be generated using one of the clustering algorithms presented in this paper. The clustering is done only once for a given task graph (system graph) indepent of any system graphs (task graphs). It is a machine-indepent (application-indepent) clustering, therefore, it is not repeated for dierent mappings. The Cluster-M mapping algorithm presented produces a sub-optimal matching of a given Spec graph containing M task modules, a Rep graph of N processors, in O(MN) time. This generic algorithm is suitable for both the allocation problem and the scheduling problem. Its performance is compared to other leading techniques. We show that Cluster-M produces better or similar results in signicantly less time and using less or equal number of processors as compared to the other known methods. Introduction An ecient parallel algorithm designed for a parallel architecture includes a detailed outline of the accurate assignment of the concurrent computations processors and data-transfers communication links, such that the overall execution time is minimized. This process may be complex deping on the application task and the underlying organization. Given the same application task, this process must be repeated for every dierent architecture. To remedy this problem, one should design portable softwares which can be any architecture or organization. It is essential to use portable programming tools with intelligent mapping modules which can support this process eciently. Design of ecient techniques for mapping parallel programs parallel computers is the focus of this paper. The mapping problem has been described in various ways through literature [5, ]. Generally, the mapping problem can be viewed as assigning a given program which consists of a collection of task modules the processing elements of the underlying architecture, so that some performance measure, e.g., total execution time, is optimized. A program can be represented in form of a task graph, and a parallel computer

2 system can be represented in form of a system graph. Mapping can be either static or dynamic. In static mapping, the assignments of the nodes of the task graphs the system graphs are determined prior to the execution and are not changed until the of the execution. A static task graph or system graph can be either uniform or non-uniform. A graph is called non-uniform if the weight of all the nodes are not the same and the weight of the edges also dier. Otherwise, it is uniform. Mapping of directed task graphs (if there is precedence relation among the task modules) is called task scheduling []. If the task graphs to be are undirected, then it is called task allocation []. Whether the graphs are directed or undirected, uniform or non-uniform, there are basically four types of static mappings based on the topological structures of the task and system graphs. These are () mapping of specialized tasks specialized systems (e.g., mapping of a chain-structured task chain-linked processors) [8, 4, 4], () mapping of specialized tasks arbitrary systems (e.g., mapping of trees any architectures) [, ], () mapping of arbitrary tasks specialized systems (e.g., mapping of any tasks a hypercube or a completely connected network) [9,,, 5, 4, 6] and (4) mapping of arbitrary tasks arbitrary systems [,, 6, 7, 9, 6]. The focus of this paper is on the mapping of arbitrary tasks arbitrary systems. One of the earliest mapping algorithms, which can map an arbitrary task an arbitrary system, is Lo's heuristic in [7]. Basically, this heuristic repetitively uses a max-ow min-cut algorithm to nd mappings of task modules heterogeneous processors. The time complexity of Lo's heuristic is O(M NjE p j log M), where M is the number of task modules, N is the number of processors and je p j is the number of communication links between processors. El-Rewini and Lewis in [9] presented their mapping heuristic (MH). MH is a list scheduling algorithm which maps an arbitrary task graph an arbitrary system graph. In list scheduling, each task module is assigned a priority. Whenever a processor is available, a task module with the highest priority is selected from the list and assigned to a processor. MH has a time complexity of O(M N ). The mapping problem can also be viewed as a graph matching problem [, ]. In this case, the task graph is to be matched against the system graph in order to minimize the overall execution time. This problem has been known to be NP-complete in its general form as well as in several restricted forms []. In an attempt to solve the problem in a general case, a number of heuristics have been introduced [,,, 6, ]. Bokhari in [] searches for the best matching of the edges of the undirected task graph versus the system graph. This heuristic algorithm is based on local search and pair-wise exchange. Lee and Aggarwal's mapping strategy is another example of this approach but considers directed task graph [6]. They both assume the number of nodes of the task graph to be no greater than that of the system graph. The time complexities of both algorithms are O(N ). To reduce the complexity of the mapping problem, a number of approaches such as graph contraction and clustering have been studied [, 5,, 5, 6]. However, in all of these graph matching based techniques, only the task graph is clustered and the entire task graph is then matched against the entire system graph. In this paper, we will present a new mapping technique which not only clusters the task graph but

3 also clusters the system graph for more ecient mapping. The clustering is done only once for a given task graph (system graph) indepent of any system graphs (task graphs). It is a machine-indepent (application-indepent) clustering, therefore, it is not repeated for dierent mappings. The mapping algorithm presented in this paper has a time complexity of only O(MN). It is part of Cluster-M programming tool []. This generic algorithm is suitable for both the allocation problem and the scheduling problem. The task and system graphs studied in this paper have uniform weight on edges. An exted mapping algorithm which works for non-uniform task graphs and system graphs is presented in [7]. The paper continues as follows. In section, we present Cluster-M clustering algorithms. The mapping algorithm is then detailed in section. This is followed by comparisons of our algorithm with other existing mapping algorithms (for both allocation and scheduling) in section 4. Finally, a brief conclusion is given in section 5. Cluster-M Clustering Cluster-M is a programming tool which facilitates the design and mapping of portable parallel programs. Cluster-M has three main components: Cluster-M Specication, Cluster-M Representation and Cluster- M mapping module. Cluster-M Specications are high-level machine-indepent programs represented in the form of a multi-level clustered task graph called Spec graph. Each clustering level in the Spec graph represents a set of concurrent computations, called Spec clusters. A Cluster-M Representation, on the other hand, represents a multi-level partitioning of a system graph called Rep graph. At every partitioning level of the Rep graph, there are a number of clusters called Rep clusters. Each Rep cluster represents a set of processors with a certain degree of connectivity. Given a task (or system) graph, a Spec (or Rep) graph can be generated using one of the clustering algorithms described below. The clustering is done only once for a given task graph (system graph) indepent of any system graphs (task graphs). It is a machineindepent (application-indepent) clustering, therefore it is not repeated for dierent mappings. For this reason, the time complexity of the clustering algorithms is not included in the time complexity of the mapping algorithm presented in section.. Clustering directed graphs Many clustering techniques have been developed to reduce the order and size of task graphs [, 9,,, 8, 4, ]. For example, a cluster can be a clan [9] which is set of nodes with common outside ancestors and descants on the task graph. Cluster-M based mapping requires clustering for the task graph as well as the system graph to obtain better and faster solutions. For clustering either the task graph or the system graph, we use the following algorithm if the input graph is directed, otherwise we use the algorithm presented in the next section (.). In the scheduling problem, task graphs are directed, while in the task allocation problem they are not. The system graphs, on the other hand, are always assumed to be undirected (todays computers have bi-directional links). Therefore, the algorithm presented below is to be used only for directed

4 task graphs. In the following, we give a formal denition of directed task graphs which is also applicable to undirected task graphs with the exception that in undirected graphs, for every i,j, (t i ; t j ) = (t j ; t i ). A task can be represented by a task graph G t (V t ; E t ), where V t = ft,..., t M g is a set of task modules to be executed, and E t is a set of edges representing the partial orders and communication directions between task modules. A directed edge (t i ; t j ) represents that a data communication exists from module t i to t j and that t i must be completed before t j may begin. Furthermore, each task module t i is associated with its amount of computation A i. Each edge (t i ; t j ) is associated with D ij, the amount of data required to be transmitted from module t i to module t j. Note A i and D ij, for i; j M. If a directed edge (t i ; t j ) exists, we call t i a parent node of t j and t j a child node of t i. If a node has more than one child, it is called a broadcast node. If a node has more than one parent, it is called a merge node. According to the data and operational precedence, nodes can be grouped into execution steps and edges can be grouped into execution phases as described below. An execution step (phase) represents a set of computations (communications) which can be carried out in parallel. Task nodes in execution step are those without parent nodes. Task nodes in step i (i > ) are those with at least one parent node in step i? but no parent node in step j (j i). Edges in phase i are those (t x ; t y ) where t x is in execution step i. In this paper, we assume that the amount of data communication between any two task modules is uniform, i.e., D ij =, for i; j M, (t i ; t j ) E t. This assumption leads to the simple greedy clustering in the clustering algorithm which will be described later. An exted clustering algorithm which clusters non-uniformly weighted directed task graphs is presented in [7]. The algorithm for clustering directed graphs is presented in Figure. The basic idea is to merge all the nodes in each execution step if they have a common parent node or a common child node. If a parent node t i has one or more children, one must be embedded to t i. Each cluster has a size which is the number of the member nodes in this cluster. A member node can be either a task module or a \supernode". A supernode is obtained by embedding one task module another task module or supernode (this embedding process is also known as linear clustering [4]). If a Spec cluster has a size Si and the sizes of its sub-clusters at the lower level are Si,, Sik. It is obvious Si = Si + + Sik. The complexity of the clustering-directed-graph algorithm is in the order of the number of edges of the task graph, which is O(M ) in the worst case, where M is the number of nodes of the task graph. To illustrate this algorithm, the following example is presented. A task graph of 5 modules is shown in Figure. Each module has computation amount of, and each edge carries amount of data communication of. This task graph contains two subgraphs which are not connected. This means that the two subtasks can be executed in parallel. The Spec graph is constructed by merging the clusters when they have communication needs, as illustrated in Figure. The input task graph has nodes a to o (5 nodes). The nal Spec graph is a multi-layered graph containing member nodes a to i (9 nodes). For example, j, k and l are embedded to d, since j, k and l are in dierent execution steps and can not be executed concurrently. This will not only save the processor resources and communication cost, 4

5 Clustering-directed-graphs Algorithm group nodes of given task graph into corresponding steps group edges of given task graph into corresponding phases for all nodes at step, do make it into a cluster for all phases, do for all edges (ti; tj), do begin if tj is a merge node, then begin embed tj to ti if the parent nodes of tj are not in a cluster, then begin merge them into a cluster increase cluster size if ti is a broadcast node, then begin k = number of nodes in cluster ti belongs to if ti has more than k children, then begin embed rst k children to the above k nodes merge the rest into the above cluster increase cluster size else embed all children Figure : Clustering-directed-graphs Algorithm. but also reduce the mapping cost since the Spec graph now contains only 9 member nodes instead of the original 5.. Clustering undirected graphs The algorithm presented in this section can be used for generating the Spec graph of an undirected task graph (for allocation problem), as well as the Rep graph of a system graph (undirected). Since the denition of directed task graph presented in the last section is also applicable to undirected task graph (with the exception of (t i ; t j ) = (t j ; t i ), for all i, j), in this section, we only present the denition of system graphs (undirected). We then present the algorithm for generating a clustered graph (Spec graph for task graph, or Rep graph for system graph) out of such an undirected input graph. A parallel system can be modeled as an undirected system graph G p (V p ; E p ). V p = fp ; :::; p N g is a set of processors forming the underlying architecture, while E p is a set of edges representing the interconnection topology of the parallel system. We assume the connections between adjacent processors of the parallel systems studied here are bi-directional. Therefore, an edge (p i ; p j ) represents there is a direct connection between processor p i and p j. The speed of processor p i is denoted by S i, and the transmission rate over edge (p i ; p j ) is denoted by R ij. In this paper, we assume that S i = and R ij = for i; j N, (p i ; p j ) E p. This assumption leads to the simple greedy clustering. An exted clustering algorithm which clusters a non-uniform undirected graph is given in [7]. To construct a clustered graph (Rep graph or Spec graph) from an undirected input graph, initially, every 5

6 d c e g j b m f n h i k a l A task graph o clusters step: d c b a e g step: c d e f g h i embed j to d embed m to e embed n to g step: b c d e f g h i embed k to d embed o to f step4: a b c d embed l to d result: a b c d e f g h i embed j, k, l to d m to e n to g o to f Constructing the Spec graph Figure : A task graph and the obtained Spec graph. 6

7 Clustering-undirected-graphs Algorithm for all nodes pi (ti), do make a cluster for pi (ti) at clustering level set cluster level to while merging is possible, do begin for all clusters c at current level, do begin make c into cluster c at next level delete cluster c from current level for all clusters x in current level, do if x is connected to all sub-clusters of c, then begin merge x into c delete x from current level increment clustering level by Figure : Clustering-undirected-graphs Algorithm. B D F H A C E G An undirected graph. step: A B C D E F G H step: A B C D E F G H step: A B C D E F G H step4: (result) A B C D E F G H Clustering of the undirected graph. Figure 4: An undirected graph and its clustering. node forms a cluster. This node is presented by p i in the case of system graph and by t i in the case of task graph. Then clusters which are completely connected are merged to form a new cluster. This is continued until no more merging is possible. Two clusters x and y are connected if x contains a node p x (or t x ) and y contains a node p y (or t y ), such that (p x ; p y ) E p (or (t x ; t y ) E t ). Each cluster has a size which is the number of the nodes it contains. If a Rep (or Spec) cluster has a size Ri (or Si ) and the sizes of its sub-clusters at the lower level are Ri,, Rik (or Si,, Sik ), it is obvious that Ri = Ri + + Rik (or Si = Si + + Sik ). The algorithm for clustering undirected graphs is seen in Figure. Figure 4 shows an example. The undirected graph shown can present a system graph, therefore the generated output as shown is a Rep graph. However, if the same input is an undirected task graph for allocation problem, then the generated output is a Spec graph. 7

8 We now analyze the running time of this implementation. For each level, we compare each cluster in that level with other clusters in the same level and check if they form a clique. Suppose at a certain level of system graph (undirected task graph), there are m clusters c ; ; c m, with each cluster c i containing P i number of processors (T i number of task modules). We have P m i= P i = N ( P m i= T i = M), where N is the number of underlying processors (M is the number of task modules). The time of clustering at this level is dominated by the total number of comparisons made to determine if each cluster is connected to all P P m sub-clusters of another cluster at next level, which is at most i=p P m P m j=i+ ip j P i= in N m ( i=p P m T m j=i+ it j T i= im M ). The number of levels can be at most N? (or M? ). Therefore the total time complexity of this algorithm is O(N ) (or O(M )). Cluster-M Mapping Algorithm For a given problem, a high level machine indepent parallel solution can be presented in form of a Cluster- M Specication, thus directly representing a Cluster-M Spec graph []. However, a Spec graph can also be generated directly from a given task graph, using one of the algorithms in the last section (clusteringdirected-graphs algorithm is used for directed task graph in scheduling, and clustering-undirected-graphs algorithm is used for allocation). On the other hand, given a system graph representing an underlying architecture or organization, a Rep graph can be generated using the algorithm presented in section.. In this section, given a Spec graph and a Rep graph as the input to the mapping module, we present an ecient mapping algorithm which produces a sub-optimal matching of them in O(M N) time. The mapping procedure presented in this paper has a much lower time complexity than the traditional mappings since it contains a graph matching procedure for which both of the input graphs have been clustered. In the following, we rst present a set of preliminaries and then give a high level description of the mapping algorithm.. Preliminaries First we dene the mapping function f m : V t?! V p. Following the precedence constraints and the computation and communication requirements of the original task graph, a schedule can be obtained which places each task module t i to processor f m (t i ) at the proper time (earliest possible starting time). Since the edges of both task and system graphs are uniformly weighted, we assume that the communication time of the task graph edge (t i ; t j ) is equal to dist(f m (t i ); f m (t j )), where dist(p i ; p j ) is the shortest distance between processor p i and p j. We also assume it takes no time to communicate data at the same processor, i.e., dist(p i ; p i ) =. A schedule can be illustrated by a Gantt chart which consists of a list of all processors and for each processor a list of all task modules allocated to that processor ordered by their execution time []. We dene the total execution time of a schedule, T m, to be the latest nishing computation time of the last scheduled task module on any processor. Obviously, T m is equal to the total execution time of a given task on a given system. As we consider the shortest execution time of a given task on a system to be the ultimate 8

9 goal in mapping, we take T m as our measure of quality to scale how good a mapping is. However, since T m can only be calculated once a schedule has been obtained, it is dicult to predict T m in the process of mapping. Therefore, we shall present another objective function as part of the the mapping heuristic to guide the mapping process. This function is described in section... Mapping Algorithm A detailed description of the Cluster-M mapping algorithm is presented in Figure 5. In the following, we give an overview of the algorithm. Before starting the mapping, we need to compute a reduction factor denoted by f, which is essential to map task graphs having more nodes than the system graphs. The reduction factor f is the ratio of the total size of the Rep clusters over the total size of the Spec clusters. It is used to estimate how many computation nodes will share a processor. The mapping is done recursively at each clustering level, where we nd the best matching between Spec clusters and Rep clusters. To map each of the Spec clusters (denoted by Si ) with size Si, we search for the Rep cluster (denoted by Rj ) with the best matched size, i.e., closest to f Si. Therefore, we try to minimize the function as formulated in Equation (). If no Rep cluster with a matching size can be found for a Spec cluster, we either merge or split (unmerge) Rep clusters until a matching Rep cluster is found. jf m j = X i jf Si? Rfm(Si ) j () When the mapping at top level is completed, for each pair of the Spec and Rep clusters, the same mapping procedure is continued recursively at a lower level until the mapping is ne grained to the processor level. As the number of task modules and processors in the original task and system graphs are M and N respectively, the total numbers of all Spec and Rep clusters at all clustering levels are O(M) and O(N) respectively. Thus, the time complexity of sorting all the Spec and Rep clusters is O(M log M + N log N). In the mapping at each level, the time complexity of nding the best match of each Spec cluster is O(N), as the total number of clusters in the Rep graph is O(N). This includes the time in splitting a Rep cluster (which is also a recursive procedure) and inserting the extra part back to the sorted Rep cluster list. Suppose the total number of Spec clusters at level i is K i. The time complexity of nding the best matches P of all K i Spec clusters at level i is thus O(K i N). Since the total number of Spec clusters is O(M), i.e., i K i = O(M), the total time complexity of this mapping algorithm is O(M log M + N log N + MN) = O(MN).. Mapping Examples In section, we have constructed a Spec graph and a Rep graph from the original task graph and system graph, as shown in Figure and 4. Figure 6 shows the mapping from the obtained Spec graph to the Rep graph following the mapping algorithm described above. First, we calculate S = 9, R = 8 and f = 8=9. The Spec cluster of size 5 is the Rep cluster of the same size, however the Spec cluster of 9

10 Cluster-M Mapping Algorithm Input: input Spec graph sort all Spec clusters at each level in descing order of sizes. input Rep graph sort all Rep clusters at each level in descing order of sizes. Recursive Mapping Procedure: ffor all Spec and Rep clusters at the top levelg calculate f. if f >, let f =. calculate the required size of the Rep cluster matching S i to be f S i for each Spec cluster at top level sorted list, do begin if a Rep cluster of required size is found, then match the Spec cluster to the Rep cluster delete the Spec and Rep cluster from Spec and Rep list for each unmatched Spec cluster, do begin if the size of the rst Rep cluster > the required size, then begin split the Rep cluster into two parts with one part of the required size match the Spec cluster to this part insert the other part to proper position of the sorted Rep cluster list else begin merge Rep clusters until the sum of sizes the required size if =, then match the Spec cluster to the merged Rep cluster else begin split the merged Rep cluster into two parts with one of required size match the Spec cluster to this part insert the other part to the sorted Rep list for each matching pair of Spec cluster and Rep cluster, do begin if the Rep cluster contains only one processor, then map all the modules in the Spec cluster to the processor else begin go to the sub-clusters of the Spec and Rep cluster (thus they are pushed to top level) call the recursive mapping procedure for these clusters Figure 5: Mapping Algorithm.

11 Cluster-M Specification : Cluster-M Representation : a b c d e f g h i to be A B C D E F G H f = 8/9 Step : e f g h i A B C D E a b c d F G H Step : g h i A B C e f D E b c d F G a H Step : g A h B i C e D f E c d F b G Step 4 : c F d F Figure 6: A mapping example. Processors A B C D E F G H g n h i e m f o c d j k l b a Figure 7: The obtained mapping result. size 4 has to be the Rep cluster of size since this is the closest size match. Then the same procedure is applied to the Spec clusters at the lower level. As shown in step in Figure 6, task module a is Rep cluster H, which contains a single processor. In step, module b, e, f, g, h and i are corresponding processors. Finally in step 4, module c and d are both processor F. Since modules j, k and l are embedded to module d (see Figure ), they are also processor F, to which d is. Similarly, module m, n and o are processor D, A and E respectively. Now all the task modules in the original task graph have been corresponding processors. Figure 7 shows the nal schedule obtained from the above mapping by following the data and operational precedence of the task graph. As we can see in the Gantt chart, T m = 6.

12 Task graph System graph Figure 8: Comparison example with Bokhari: task and system graph. 4 Comparison Results In this section, we present a set of experimental results which have been obtained by comparing our algorithm with other leading techniques. The examples selected in this paper are the same as those presented and experimented by the authors of other leading techniques. The following ve criteria are used for evaluating the performance of the algorithms examined: () the total time complexity of executing the mapping algorithm, T c ; () the total execution time of the generated mappings, T m ; () the speedup S m = Ts T m, where T s is the sequential execution time of the task; (4) eciency = Sm N m, where N m is the number of processors used; and (5) the actual time of running the mapping algorithm on a certain computer, T r. In the following, we present our comparison results for both the allocation and scheduling problems. 4. Allocating undirected task graphs The goal of task allocation is to minimize the communication delay between processors and to balance the load among processors. The problem of task allocation arises when specifying the order of executing the task modules is not required. Therefore, the task graph in task allocation is undirected and the clusteringundirected-graphs algorithm is used to generate the Spec graph in this case. We consider the measure of mapping quality in task allocation to be T m. We compare our results to Bokhari's mapping (allocation) algorithm [] using undirected task graphs. Bokhari's algorithm has the running time complexity of O(N ), while ours is O(MN). Bokhari's algorithm assumes that the computation amount of each task module, the amount of data communication along each task graph edge, the computation speed of each processor and data transmission rate along each communication link are all uniform, i.e.,. It further assumes the number of task modules is no greater than the number of processors, so that the mapping can be one to one. In this case, a lower bound on T m can be +, where is the degree of a given task graph. In comparing Cluster-M with Bokhari we use the example shown in Figure 8 which has a -node task

13 Table : Mapping of Bokhari's algorithm and Cluster-M Task Mapped processor module Bokhari Cluster-M Tm 7 Tr (sec) Table : Comparisons of mappings of Bokhari's algorithm and Cluster-M Random graphs Tm Tr (sec) of nodes Bokhari Cluster-M lower bound Bokhari Cluster-M

14 graph and a 6 6 nite element machine (FEM) []. A Sun SPARC station was used for the experiments. The results are shown in Table. Note that the running time of clustering the task graph and system graph by Cluster-M, which is.7 second, is not included in T r, as our clustering is indepent of the mapping. However, even if we included it, the running time of Cluster-M would still be times faster than Bokhari's algorithm. The lower bound on T m as described before is 9, and yet both Cluster-M and Bokhari's algorithms have obtained near optimal results of T m = 7 and respectively. The above example uses the same structured task and system graph as tried in []. We have also tested other randomly generated task and system graphs. Table shows the mapping results and comparisons for randomly generated task and system graphs of nodes. Similar results were obtained for the set of random graphs. 4. Scheduling directed task graphs We rst compare our algorithm with El-Rewini and Lewis's mapping heuristic (MH) algorithm [9]. The time complexity of MH is O(M N ), while ours has a O(MN) time complexity. Given a task graph and the system graph of a -cube as shown in Figure 9, the schedule obtained from MH is illustrated by a Gantt chart in Figure (a) [9]. Similarly, the Gantt chart of the schedule obtained by Cluster-M mapping is shown in Figure (b). An optimal schedule is also shown in Figure (c). We can see that both MH and Cluster-M mappings have produced close to optimal T m for this example, yet Cluster-M is faster by a factor of O(MN ). We next compare with Lee and Aggarwal's mapping strategy [6]. Their mapping strategy considers the task graph as directed graph and dierentiate nodes and edges into dierent computation stages and communication phases, to accurately calculate the actual communication cost between two non-adjacent processors. However, it maps the entire task graph the system graph without graph contraction or clustering. Also, it assumes the number of nodes in the task graph is no greater than that of the system graph. The time complexity of Lee and Aggarwal's algorithm is O(N ), while ours is O(MN) (if M = N, then ours is O(N )). Given a task graph and the system graph of a 4-cube as shown in Figure, the comparison of the mapping results is shown in Figure. In this example, A i =, i 5. The task graph for the second comparison example with Lee and Aggarwal is shown in Figure, where A i =, i 7. The system graph for this problem is -cube. The mapping results are shown in Figure 4. Lee and Aggarwal's mapping strategy was later exted by Chaudhary and Aggarwal for mapping larger task graphs smaller system graphs [6]. The time complexity of this algorithm is O(M 4 ). Next, we compare our mapping results with Chaudhary and Aggarwal. We present two examples. In the rst example, the task graph of Figure is a -cube. The mapping results for this example is shown in Figure 5. In the second example, the task graph of Figure is a -cube. The mapping results for this example is shown in Figure 6. As we see in all the examples in this section, Cluster-M mapping has a superior running time and the results obtained are similar to or better than those from the other algorithms. 4

15 A i = 8>< >: i = ; 8 i 9 i 5 4 i Task graph System graph Figure 9: Comparison example with MH: task and system graph. 5

16 Processors t t t t t5 t8 t4 t t6 t5 t7 t7 t 4 t6 t t4 5 6 t8 t9 (a) MH, T c = O(M N ), T m = 6, S m = :, = :47 Processors t t t t4 t8 t t4 t t5 t5 t6 t t6 t7 t8 t t7 t9 (b) Cluster-M, T c = O(MP ), T m = 6, S m = :, = :4 Processors t t t t t4 t8 t4 t5 t t5 t6 t7 t t6 t8 t9 t t7 (c) Optimal, T c = O( MN ), T m = 5, S m = :4, = :4 Figure : Comparison example with MH: mapping results. 6

17 Task graph System graph Figure : Comparison example with Lee and Aggarwal: task and system graph. 7

18 Processors t t t7 t9 4 t 5 t 6 t 7 t 8 t5 9 t t4 t6 t4 t8 4 t5 5 t (a) Lee and Aggarwal, T c = O(N ), T m =, S m = :, = : Processors t t t9 t t5 t t t4 t t5 t t4 t6 t7 t8 (b) Cluster-M, T c = O(MN), T m =, S m = :5, = :9. t Processors t t t t9 t t4 t t t5 t6 t t7 t8 t t4 t5 (c) Optimal, T c = O( MN ), T m = 8, S m =, = :5. Figure : Comparison example with Lee and Aggarwal: mapping results Task graph System graph Figure : Comparison example with Lee and Aggarwal: task and system graph. 8

19 Processors t t t t4 t t6 t5 t7 (a) Lee and Aggarwal, T c = O(N ), T m = 7, S m = :, = :4. Processors t t t4 t7 t t t5 t6 (b) Cluster-M, T c = O(MN), T m = 6, S m = :, = :4. Processors t t t4 t t6 t7 t t5 (c) Optimal, T c = O( MN ), T m = 6, S m = :, = :65. Figure 4: Comparison example with Lee and Aggarwal: mapping results. Processors t t t7 t9 t t t t t4 t5 t t4 t6 t8 t t5 (a) Chaudhary and Aggarwal, T c = O(M 4 ), T m =, S m = :6, = :4. Processors t t t t9 t t5 t t4 t t5 t6 t t4 t7 t8 t (b) Cluster-M, T c = O(MN), T m =, S m = :6, = :4. Processors t t t t9 t t4 t t t5 t6 t t7 t8 t t4 t5 (c) Optimal, T c = O( MN ), T m = 8, S m =, = :5. Figure 5: Comparison example with Chaudhary and Aggarwal: mapping results. 9

20 Processors t t t t4 t t6 t5 t7 (a) Chaudhary and Aggarwal, T c = O(M 4 ), T m = 6, S m = :, = :. Processors t t t4 t7 t t t5 t6 (b) Cluster-M, T c = O(MN), T m = 6, S m = :, = :4. Processors t t t4 t t6 t7 t t5 (c) Optimal, T c = O( MN ), T m = 6, S m = :, = :65. Figure 6: Comparison example with Chaudhary and Aggarwal: mapping results. 5 Conclusion In this paper, we have presented and implemented a generic algorithm for mapping portable parallel algorithms various multiprocessor systems or computing organizations. The input to the mapping algorithm is a Spec graph which corresponds to a clustered (layered) task graph and a Rep graph which corresponds to a clustered (partitioned) system graph. Unlike other mapping approaches which only cluster the task graph, our Cluster-M based mapping algorithm requires a clustering of the system graph as well as the task graph, before executing the mapping process. The clustering is done only once for a given task graph (system graph) indepent of any system graphs (task graphs). It is a machine-indepent (applicationindepent) clustering, therefore, it is not repeated for dierent mappings. The complexity of our mapping algorithm is O(MN), where M is the number of task modules and N is the number of processors. We presented our experimental results in comparing our implemented algorithm with others. Compared to other leading techniques, Cluster-M mapping nds better or similar results in much faster time and uses less or equal number of processors. Furthermore, this generic algorithm is suitable for both the allocation problem and the scheduling problem. This work has been exted to the case where both the task graph and the system graph are non-uniform [7]. References [] F. Berman and L. Snyder. On mapping parallel algorithms into parallel architectures. Journal of Parallel and Distributed Computing, 4:49{458, 987. [] S. H. Bokhari. On the mapping problem. IEEE Trans. on Computers, c-():7{4, March 98.

21 [] S. H. Bokhari. A shortest tree algorithm for optimal assignments across space and time in a distributed processor system. IEEE Trans. on Software Engineering, SE-7(6):58{589, November 98. [4] S. H. Bokhari. Partitioning problem in parallel, pipelined, and distributed computing. IEEE Trans. on Computers, 7():48{57, January 988. [5] T. L. Casavant and J. G. Kuhl. A taxonomy of scheduling in general-purpose distributed computing systems. IEEE Trans. on Software Engineering, 4():4{45, February 988. [6] V. Chaudhary and J. K. Aggarwal. A generalized scheme for mapping parallel algorithms. IEEE Trans. on Parallel and Distributed Systems, 4():8{46, March 99. [7] S. Chen, M. M. Eshaghian, and Y. Wu. Mapping arbitrary non-uniform task graphs arbitrary non-uniform system graphs. In Proc. International Conference on Parallel Processing, August 995. [8] E. G. Coman and R. L. Graham. Optimal scheduling for two processor systems. Acta Informatica, :{, 97. [9] H. El-Rewini and T. G. Lewis. Scheduling parallel program tasks arbitrary target machines. Journal of Parallel and Distributed Computing, 9:8{5, 99. [] H. El-Rewini, T. G. Lewis, and H. H. Ali. Task Scheduling in Parallel and Distributed Systems. Prentice Hall, 994. [] F. Ercal, J. Ramanujam, and P. Sadayappan. Task allocation a hypercube by recursive mincut bipartitioning. Journal of Parallel and Distributed Computing, :5{44, 99. [] M. M. Eshaghian and M. E. Shaaban. Cluster-M parallel programming paradigm. International Journal of High Speed Computing, 6():87{9, June 994. [] D. Fernandez-Baca. Allocating modules to processors in a distributed systems. IEEE Trans. on Software Engineering, 5():47{46, November 989. [4] A. Gerasoulis and T. Yang. A comparison of clustering heuristics for scheduling directed acyclic graphs on multiprocessors. Journal of Parallel and Distributed Computing, 6:76{9, 99. [5] S. J. Kim and J. C. Browne. A general approach to mapping of parallel computation upon multiprocessor architectures. In Proc. International Conference on Parallel Processing, volume, pages {8, 988. [6] S. Lee and J. K. Aggarwal. A mapping strategy for parallel processing. IEEE Trans. on Computers, 6:4{44, April 987. [7] V. M. Lo. Heuristic algorithms for task assignment in distributed systems. IEEE Trans. on Computers, 7():84{97, November 988.

22 [8] V. M. Lo, S. Rajopadhye, S. Gupta, D. Keldsen, M. A. Mohamed, and J. A. Telle. Oregami: Software tools for mapping parallel computations to parallel architectures. In Proc. International Conference on Parallel Processing, 99. [9] C. McCreary and H. Gill. Automatic determination of grain size for ecient parallel processing. Communications of ACM, (9):7{78, September 989. [] R. Ponnusamy, N. Mansour, A. Choudhary, and G. C. Fox. Mapping realistic data sets on parallel computers. In Proc. 7th International Parallel Processing Symposium, pages {8, April 99. [] P. Sadayappan, F. Ercal, and J. Ramanujam. Cluster partitioning approaches to mapping parallel programs a hypercube. Parallel Computing, :{6, 99. [] V. Sarkar. Partitioning and Scheduling Parallel Programs for Execution on Multiprocessors. MIT Press, 989. [] C. Shen and W. Tsai. A graph matching approach to optimal task assignment in distributed computing systems using a minmax criterion. IEEE Trans. on Computers, c-4():97{, March 985. [4] H. S. Stone. Multiprocessor scheduling with the aid of network ow algorithms. IEEE Trans. on Software Engineering, SE-():85{9, January 977. [5] M. Y. Wu and D. Gajski. Hypertool: A programming aid for message-passing systems. IEEE Trans. on Parallel and Distributed Systems, ():{9, 99. [6] T. Yang and A. Gerasoulis. DSC: Scheduling parallel tasks on an unbounded number of processors. IEEE Trans. on Parallel and Distributed Systems, 5(9):95{967, September 994.

DSC: Scheduling Parallel Tasks on an Unbounded Number of. Processors 3. Tao Yang and Apostolos Gerasoulis. Department of Computer Science

DSC: Scheduling Parallel Tasks on an Unbounded Number of. Processors 3. Tao Yang and Apostolos Gerasoulis. Department of Computer Science DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors 3 Tao Yang and Apostolos Gerasoulis Department of Computer Science Rutgers University New Brunswick, NJ 08903 Email: ftyang, gerasoulisg@cs.rutgers.edu

More information

A Level-wise Priority Based Task Scheduling for Heterogeneous Systems

A Level-wise Priority Based Task Scheduling for Heterogeneous Systems International Journal of Information and Education Technology, Vol., No. 5, December A Level-wise Priority Based Task Scheduling for Heterogeneous Systems R. Eswari and S. Nickolas, Member IACSIT Abstract

More information

Y. Han* B. Narahari** H-A. Choi** University of Kentucky. The George Washington University

Y. Han* B. Narahari** H-A. Choi** University of Kentucky. The George Washington University Mapping a Chain Task to Chained Processors Y. Han* B. Narahari** H-A. Choi** *Department of Computer Science University of Kentucky Lexington, KY 40506 **Department of Electrical Engineering and Computer

More information

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines Zhou B. B., Brent R. P. and Tridgell A. y Computer Sciences Laboratory The Australian National University Canberra,

More information

Process Allocation for Load Distribution in Fault-Tolerant. Jong Kim*, Heejo Lee*, and Sunggu Lee** *Dept. of Computer Science and Engineering

Process Allocation for Load Distribution in Fault-Tolerant. Jong Kim*, Heejo Lee*, and Sunggu Lee** *Dept. of Computer Science and Engineering Process Allocation for Load Distribution in Fault-Tolerant Multicomputers y Jong Kim*, Heejo Lee*, and Sunggu Lee** *Dept. of Computer Science and Engineering **Dept. of Electrical Engineering Pohang University

More information

A Novel Task Scheduling Algorithm for Heterogeneous Computing

A Novel Task Scheduling Algorithm for Heterogeneous Computing A Novel Task Scheduling Algorithm for Heterogeneous Computing Vinay Kumar C. P.Katti P. C. Saxena SC&SS SC&SS SC&SS Jawaharlal Nehru University Jawaharlal Nehru University Jawaharlal Nehru University New

More information

Critical Path Scheduling Parallel Programs on an Unbounded Number of Processors

Critical Path Scheduling Parallel Programs on an Unbounded Number of Processors Critical Path Scheduling Parallel Programs on an Unbounded Number of Processors Mourad Hakem, Franck Butelle To cite this version: Mourad Hakem, Franck Butelle. Critical Path Scheduling Parallel Programs

More information

Heap-on-Top Priority Queues. March Abstract. We introduce the heap-on-top (hot) priority queue data structure that combines the

Heap-on-Top Priority Queues. March Abstract. We introduce the heap-on-top (hot) priority queue data structure that combines the Heap-on-Top Priority Queues Boris V. Cherkassky Central Economics and Mathematics Institute Krasikova St. 32 117418, Moscow, Russia cher@cemi.msk.su Andrew V. Goldberg NEC Research Institute 4 Independence

More information

A Modified Genetic Algorithm for Task Scheduling in Multiprocessor Systems

A Modified Genetic Algorithm for Task Scheduling in Multiprocessor Systems A Modified Genetic Algorithm for Task Scheduling in Multiprocessor Systems Yi-Hsuan Lee and Cheng Chen Department of Computer Science and Information Engineering National Chiao Tung University, Hsinchu,

More information

Chapter 5: Distributed Process Scheduling. Ju Wang, 2003 Fall Virginia Commonwealth University

Chapter 5: Distributed Process Scheduling. Ju Wang, 2003 Fall Virginia Commonwealth University Chapter 5: Distributed Process Scheduling CMSC 602 Advanced Operating Systems Static Process Scheduling Dynamic Load Sharing and Balancing Real-Time Scheduling Section 5.2, 5.3, and 5.5 Additional reading:

More information

Contention-Aware Scheduling with Task Duplication

Contention-Aware Scheduling with Task Duplication Contention-Aware Scheduling with Task Duplication Oliver Sinnen, Andrea To, Manpreet Kaur Department of Electrical and Computer Engineering, University of Auckland Private Bag 92019, Auckland 1142, New

More information

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines B. B. Zhou, R. P. Brent and A. Tridgell Computer Sciences Laboratory The Australian National University Canberra,

More information

Network. Department of Statistics. University of California, Berkeley. January, Abstract

Network. Department of Statistics. University of California, Berkeley. January, Abstract Parallelizing CART Using a Workstation Network Phil Spector Leo Breiman Department of Statistics University of California, Berkeley January, 1995 Abstract The CART (Classication and Regression Trees) program,

More information

Optimum Alphabetic Binary Trees T. C. Hu and J. D. Morgenthaler Department of Computer Science and Engineering, School of Engineering, University of C

Optimum Alphabetic Binary Trees T. C. Hu and J. D. Morgenthaler Department of Computer Science and Engineering, School of Engineering, University of C Optimum Alphabetic Binary Trees T. C. Hu and J. D. Morgenthaler Department of Computer Science and Engineering, School of Engineering, University of California, San Diego CA 92093{0114, USA Abstract. We

More information

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1 CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanford.edu) January 11, 2018 Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1 In this lecture

More information

Benchmarking and Comparison of the Task Graph Scheduling Algorithms

Benchmarking and Comparison of the Task Graph Scheduling Algorithms Benchmarking and Comparison of the Task Graph Scheduling Algorithms Yu-Kwong Kwok 1 and Ishfaq Ahmad 2 1 Department of Electrical and Electronic Engineering The University of Hong Kong, Pokfulam Road,

More information

Controlled duplication for scheduling real-time precedence tasks on heterogeneous multiprocessors

Controlled duplication for scheduling real-time precedence tasks on heterogeneous multiprocessors Controlled duplication for scheduling real-time precedence tasks on heterogeneous multiprocessors Jagpreet Singh* and Nitin Auluck Department of Computer Science & Engineering Indian Institute of Technology,

More information

On Checkpoint Latency. Nitin H. Vaidya. In the past, a large number of researchers have analyzed. the checkpointing and rollback recovery scheme

On Checkpoint Latency. Nitin H. Vaidya. In the past, a large number of researchers have analyzed. the checkpointing and rollback recovery scheme On Checkpoint Latency Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-3112 E-mail: vaidya@cs.tamu.edu Web: http://www.cs.tamu.edu/faculty/vaidya/ Abstract

More information

Theoretical Foundations of SBSE. Xin Yao CERCIA, School of Computer Science University of Birmingham

Theoretical Foundations of SBSE. Xin Yao CERCIA, School of Computer Science University of Birmingham Theoretical Foundations of SBSE Xin Yao CERCIA, School of Computer Science University of Birmingham Some Theoretical Foundations of SBSE Xin Yao and Many Others CERCIA, School of Computer Science University

More information

An Effective Load Balancing Task Allocation Algorithm using Task Clustering

An Effective Load Balancing Task Allocation Algorithm using Task Clustering An Effective Load Balancing Task Allocation using Task Clustering Poornima Bhardwaj Research Scholar, Department of Computer Science Gurukul Kangri Vishwavidyalaya,Haridwar, India Vinod Kumar, Ph.d Professor

More information

The Global Standard for Mobility (GSM) (see, e.g., [6], [4], [5]) yields a

The Global Standard for Mobility (GSM) (see, e.g., [6], [4], [5]) yields a Preprint 0 (2000)?{? 1 Approximation of a direction of N d in bounded coordinates Jean-Christophe Novelli a Gilles Schaeer b Florent Hivert a a Universite Paris 7 { LIAFA 2, place Jussieu - 75251 Paris

More information

Note Improved fault-tolerant sorting algorithm in hypercubes

Note Improved fault-tolerant sorting algorithm in hypercubes Theoretical Computer Science 255 (2001) 649 658 www.elsevier.com/locate/tcs Note Improved fault-tolerant sorting algorithm in hypercubes Yu-Wei Chen a, Kuo-Liang Chung b; ;1 a Department of Computer and

More information

[HaKa92] L. Hagen and A. B. Kahng, A new approach to eective circuit clustering, Proc. IEEE

[HaKa92] L. Hagen and A. B. Kahng, A new approach to eective circuit clustering, Proc. IEEE [HaKa92] L. Hagen and A. B. Kahng, A new approach to eective circuit clustering, Proc. IEEE International Conference on Computer-Aided Design, pp. 422-427, November 1992. [HaKa92b] L. Hagen and A. B.Kahng,

More information

Eect of fan-out on the Performance of a. Single-message cancellation scheme. Atul Prakash (Contact Author) Gwo-baw Wu. Seema Jetli

Eect of fan-out on the Performance of a. Single-message cancellation scheme. Atul Prakash (Contact Author) Gwo-baw Wu. Seema Jetli Eect of fan-out on the Performance of a Single-message cancellation scheme Atul Prakash (Contact Author) Gwo-baw Wu Seema Jetli Department of Electrical Engineering and Computer Science University of Michigan,

More information

Task Allocation for Minimizing Programs Completion Time in Multicomputer Systems

Task Allocation for Minimizing Programs Completion Time in Multicomputer Systems Task Allocation for Minimizing Programs Completion Time in Multicomputer Systems Gamal Attiya and Yskandar Hamam Groupe ESIEE Paris, Lab. A 2 SI Cité Descartes, BP 99, 93162 Noisy-Le-Grand, FRANCE {attiyag,hamamy}@esiee.fr

More information

So the actual cost is 2 Handout 3: Problem Set 1 Solutions the mark counter reaches c, a cascading cut is performed and the mark counter is reset to 0

So the actual cost is 2 Handout 3: Problem Set 1 Solutions the mark counter reaches c, a cascading cut is performed and the mark counter is reset to 0 Massachusetts Institute of Technology Handout 3 6854/18415: Advanced Algorithms September 14, 1999 David Karger Problem Set 1 Solutions Problem 1 Suppose that we have a chain of n 1 nodes in a Fibonacci

More information

Toward the joint design of electronic and optical layer protection

Toward the joint design of electronic and optical layer protection Toward the joint design of electronic and optical layer protection Massachusetts Institute of Technology Slide 1 Slide 2 CHALLENGES: - SEAMLESS CONNECTIVITY - MULTI-MEDIA (FIBER,SATCOM,WIRELESS) - HETEROGENEOUS

More information

16 Greedy Algorithms

16 Greedy Algorithms 16 Greedy Algorithms Optimization algorithms typically go through a sequence of steps, with a set of choices at each For many optimization problems, using dynamic programming to determine the best choices

More information

Parallel Graph Algorithms

Parallel Graph Algorithms Parallel Graph Algorithms Design and Analysis of Parallel Algorithms 5DV050 Spring 202 Part I Introduction Overview Graphsdenitions, properties, representation Minimal spanning tree Prim's algorithm Shortest

More information

A Static Scheduling Heuristic for. Heterogeneous Processors. Hyunok Oh and Soonhoi Ha

A Static Scheduling Heuristic for. Heterogeneous Processors. Hyunok Oh and Soonhoi Ha 1 Static Scheduling Heuristic for Heterogeneous Processors Hyunok Oh and Soonhoi Ha The Department of omputer Engineering, Seoul National University, Seoul, 11-742, Korea: e-mail: foho,shag@comp.snu.ac.kr

More information

F k G A S S1 3 S 2 S S V 2 V 3 V 1 P 01 P 11 P 10 P 00

F k G A S S1 3 S 2 S S V 2 V 3 V 1 P 01 P 11 P 10 P 00 PRLLEL SPRSE HOLESKY FTORIZTION J URGEN SHULZE University of Paderborn, Department of omputer Science Furstenallee, 332 Paderborn, Germany Sparse matrix factorization plays an important role in many numerical

More information

Real-Time Scalability of Nested Spin Locks. Hiroaki Takada and Ken Sakamura. Faculty of Science, University of Tokyo

Real-Time Scalability of Nested Spin Locks. Hiroaki Takada and Ken Sakamura. Faculty of Science, University of Tokyo Real-Time Scalability of Nested Spin Locks Hiroaki Takada and Ken Sakamura Department of Information Science, Faculty of Science, University of Tokyo 7-3-1, Hongo, Bunkyo-ku, Tokyo 113, Japan Abstract

More information

Reliability and Scheduling on Systems Subject to Failures

Reliability and Scheduling on Systems Subject to Failures Reliability and Scheduling on Systems Subject to Failures Mourad Hakem and Franck Butelle LIPN CNRS UMR 7030 Université Paris Nord Av. J.B. Clément 93430 Villetaneuse France {Mourad.Hakem,Franck.Butelle}@lipn.univ-paris3.fr

More information

Optimal Subcube Fault Tolerance in a Circuit-Switched Hypercube

Optimal Subcube Fault Tolerance in a Circuit-Switched Hypercube Optimal Subcube Fault Tolerance in a Circuit-Switched Hypercube Baback A. Izadi Dept. of Elect. Eng. Tech. DeV ry Institute of Technology Columbus, OH 43209 bai @devrycols.edu Fiisun ozgiiner Department

More information

Decreasing a key FIB-HEAP-DECREASE-KEY(,, ) 3.. NIL. 2. error new key is greater than current key 6. CASCADING-CUT(, )

Decreasing a key FIB-HEAP-DECREASE-KEY(,, ) 3.. NIL. 2. error new key is greater than current key 6. CASCADING-CUT(, ) Decreasing a key FIB-HEAP-DECREASE-KEY(,, ) 1. if >. 2. error new key is greater than current key 3.. 4.. 5. if NIL and.

More information

9/24/ Hash functions

9/24/ Hash functions 11.3 Hash functions A good hash function satis es (approximately) the assumption of SUH: each key is equally likely to hash to any of the slots, independently of the other keys We typically have no way

More information

An Ecient Approximation Algorithm for the. File Redistribution Scheduling Problem in. Fully Connected Networks. Abstract

An Ecient Approximation Algorithm for the. File Redistribution Scheduling Problem in. Fully Connected Networks. Abstract An Ecient Approximation Algorithm for the File Redistribution Scheduling Problem in Fully Connected Networks Ravi Varadarajan Pedro I. Rivera-Vega y Abstract We consider the problem of transferring a set

More information

22 Elementary Graph Algorithms. There are two standard ways to represent a

22 Elementary Graph Algorithms. There are two standard ways to represent a VI Graph Algorithms Elementary Graph Algorithms Minimum Spanning Trees Single-Source Shortest Paths All-Pairs Shortest Paths 22 Elementary Graph Algorithms There are two standard ways to represent a graph

More information

Mapping of Parallel Tasks to Multiprocessors with Duplication *

Mapping of Parallel Tasks to Multiprocessors with Duplication * Mapping of Parallel Tasks to Multiprocessors with Duplication * Gyung-Leen Park Dept. of Comp. Sc. and Eng. Univ. of Texas at Arlington Arlington, TX 76019-0015 gpark@cse.uta.edu Behrooz Shirazi Dept.

More information

Lecture 9: Load Balancing & Resource Allocation

Lecture 9: Load Balancing & Resource Allocation Lecture 9: Load Balancing & Resource Allocation Introduction Moler s law, Sullivan s theorem give upper bounds on the speed-up that can be achieved using multiple processors. But to get these need to efficiently

More information

Abstract Relaxed balancing of search trees was introduced with the aim of speeding up the updates and allowing a high degree of concurrency. In a rela

Abstract Relaxed balancing of search trees was introduced with the aim of speeding up the updates and allowing a high degree of concurrency. In a rela Chromatic Search Trees Revisited Institut fur Informatik Report 9 Sabine Hanke Institut fur Informatik, Universitat Freiburg Am Flughafen 7, 79 Freiburg, Germany Email: hanke@informatik.uni-freiburg.de.

More information

1 Introduction The concept of graph spanners has been studied in several recent papers in the context of communication networks, distributed computing

1 Introduction The concept of graph spanners has been studied in several recent papers in the context of communication networks, distributed computing On the Hardness of Approximating Spanners Guy Kortsarz June 1, 1999 Abstract A k spanner of a connected graph G = (V; E) is a subgraph G 0 consisting of all the vertices of V and a subset of the edges,

More information

Technical Report TR , Computer and Information Sciences Department, University. Abstract

Technical Report TR , Computer and Information Sciences Department, University. Abstract An Approach for Parallelizing any General Unsymmetric Sparse Matrix Algorithm Tariq Rashid y Timothy A.Davis z Technical Report TR-94-036, Computer and Information Sciences Department, University of Florida,

More information

would be included in is small: to be exact. Thus with probability1, the same partition n+1 n+1 would be produced regardless of whether p is in the inp

would be included in is small: to be exact. Thus with probability1, the same partition n+1 n+1 would be produced regardless of whether p is in the inp 1 Introduction 1.1 Parallel Randomized Algorihtms Using Sampling A fundamental strategy used in designing ecient algorithms is divide-and-conquer, where that input data is partitioned into several subproblems

More information

Cluster Partitioning Approaches to Mapping Parallel Programs. Onto a Hypercube. P. Sadayappan, F. Ercal and J. Ramanujam

Cluster Partitioning Approaches to Mapping Parallel Programs. Onto a Hypercube. P. Sadayappan, F. Ercal and J. Ramanujam Cluster Partitioning Approaches to Mapping Parallel Programs Onto a Hypercube P. Sadayappan, F. Ercal and J. Ramanujam Department of Computer and Information Science The Ohio State University, Columbus

More information

Lecture 9 - Matrix Multiplication Equivalences and Spectral Graph Theory 1

Lecture 9 - Matrix Multiplication Equivalences and Spectral Graph Theory 1 CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanfordedu) February 6, 2018 Lecture 9 - Matrix Multiplication Equivalences and Spectral Graph Theory 1 In the

More information

Provably Efficient Non-Preemptive Task Scheduling with Cilk

Provably Efficient Non-Preemptive Task Scheduling with Cilk Provably Efficient Non-Preemptive Task Scheduling with Cilk V. -Y. Vee and W.-J. Hsu School of Applied Science, Nanyang Technological University Nanyang Avenue, Singapore 639798. Abstract We consider the

More information

A Note on Scheduling Parallel Unit Jobs on Hypercubes

A Note on Scheduling Parallel Unit Jobs on Hypercubes A Note on Scheduling Parallel Unit Jobs on Hypercubes Ondřej Zajíček Abstract We study the problem of scheduling independent unit-time parallel jobs on hypercubes. A parallel job has to be scheduled between

More information

Neuro-Remodeling via Backpropagation of Utility. ABSTRACT Backpropagation of utility is one of the many methods for neuro-control.

Neuro-Remodeling via Backpropagation of Utility. ABSTRACT Backpropagation of utility is one of the many methods for neuro-control. Neuro-Remodeling via Backpropagation of Utility K. Wendy Tang and Girish Pingle 1 Department of Electrical Engineering SUNY at Stony Brook, Stony Brook, NY 11794-2350. ABSTRACT Backpropagation of utility

More information

A Parallel Algorithm for Exact Structure Learning of Bayesian Networks

A Parallel Algorithm for Exact Structure Learning of Bayesian Networks A Parallel Algorithm for Exact Structure Learning of Bayesian Networks Olga Nikolova, Jaroslaw Zola, and Srinivas Aluru Department of Computer Engineering Iowa State University Ames, IA 0010 {olia,zola,aluru}@iastate.edu

More information

PARALLEL PERFORMANCE DIRECTED TECHNOLOGY MAPPING FOR FPGA. Laurent Lemarchand. Informatique. ea 2215, D pt. ubo University{ bp 809

PARALLEL PERFORMANCE DIRECTED TECHNOLOGY MAPPING FOR FPGA. Laurent Lemarchand. Informatique. ea 2215, D pt. ubo University{ bp 809 PARALLEL PERFORMANCE DIRECTED TECHNOLOGY MAPPING FOR FPGA Laurent Lemarchand Informatique ubo University{ bp 809 f-29285, Brest { France lemarch@univ-brest.fr ea 2215, D pt ABSTRACT An ecient distributed

More information

Implementations of Dijkstra's Algorithm. Based on Multi-Level Buckets. November Abstract

Implementations of Dijkstra's Algorithm. Based on Multi-Level Buckets. November Abstract Implementations of Dijkstra's Algorithm Based on Multi-Level Buckets Andrew V. Goldberg NEC Research Institute 4 Independence Way Princeton, NJ 08540 avg@research.nj.nec.com Craig Silverstein Computer

More information

Department of Computer Science. a vertex can communicate with a particular neighbor. succeeds if it shares no edge with other calls during

Department of Computer Science. a vertex can communicate with a particular neighbor. succeeds if it shares no edge with other calls during Sparse Hypercube A Minimal k-line Broadcast Graph Satoshi Fujita Department of Electrical Engineering Hiroshima University Email: fujita@se.hiroshima-u.ac.jp Arthur M. Farley Department of Computer Science

More information

Optimal Partitioning of Sequences. Abstract. The problem of partitioning a sequence of n real numbers into p intervals

Optimal Partitioning of Sequences. Abstract. The problem of partitioning a sequence of n real numbers into p intervals Optimal Partitioning of Sequences Fredrik Manne and Tor S revik y Abstract The problem of partitioning a sequence of n real numbers into p intervals is considered. The goal is to nd a partition such that

More information

On Checkpoint Latency. Nitin H. Vaidya. Texas A&M University. Phone: (409) Technical Report

On Checkpoint Latency. Nitin H. Vaidya. Texas A&M University.   Phone: (409) Technical Report On Checkpoint Latency Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-3112 E-mail: vaidya@cs.tamu.edu Phone: (409) 845-0512 FAX: (409) 847-8578 Technical Report

More information

Server 1 Server 2 CPU. mem I/O. allocate rec n read elem. n*47.0. n*20.0. select. n*1.0. write elem. n*26.5 send. n*

Server 1 Server 2 CPU. mem I/O. allocate rec n read elem. n*47.0. n*20.0. select. n*1.0. write elem. n*26.5 send. n* Information Needs in Performance Analysis of Telecommunication Software a Case Study Vesa Hirvisalo Esko Nuutila Helsinki University of Technology Laboratory of Information Processing Science Otakaari

More information

Document Image Restoration Using Binary Morphological Filters. Jisheng Liang, Robert M. Haralick. Seattle, Washington Ihsin T.

Document Image Restoration Using Binary Morphological Filters. Jisheng Liang, Robert M. Haralick. Seattle, Washington Ihsin T. Document Image Restoration Using Binary Morphological Filters Jisheng Liang, Robert M. Haralick University of Washington, Department of Electrical Engineering Seattle, Washington 98195 Ihsin T. Phillips

More information

22 Elementary Graph Algorithms. There are two standard ways to represent a

22 Elementary Graph Algorithms. There are two standard ways to represent a VI Graph Algorithms Elementary Graph Algorithms Minimum Spanning Trees Single-Source Shortest Paths All-Pairs Shortest Paths 22 Elementary Graph Algorithms There are two standard ways to represent a graph

More information

An Efficient Method for Constructing a Distributed Depth-First Search Tree

An Efficient Method for Constructing a Distributed Depth-First Search Tree An Efficient Method for Constructing a Distributed Depth-First Search Tree S. A. M. Makki and George Havas School of Information Technology The University of Queensland Queensland 4072 Australia sam@it.uq.oz.au

More information

The Competitiveness of On-Line Assignments. Computer Science Department. Raphael Rom. Sun Microsystems. Mountain View CA

The Competitiveness of On-Line Assignments. Computer Science Department. Raphael Rom. Sun Microsystems. Mountain View CA The Competitiveness of On-Line Assignments Yossi Azar y Computer Science Department Tel-Aviv University Tel-Aviv 69978, Israel Joseph (Se) Naor z Computer Science Department Technion Haifa 32000, Israel

More information

Flow simulation. Frank Lohmeyer, Oliver Vornberger. University of Osnabruck, D Osnabruck.

Flow simulation. Frank Lohmeyer, Oliver Vornberger. University of Osnabruck, D Osnabruck. To be published in: Notes on Numerical Fluid Mechanics, Vieweg 1994 Flow simulation with FEM on massively parallel systems Frank Lohmeyer, Oliver Vornberger Department of Mathematics and Computer Science

More information

CONSTRUCTION AND EVALUATION OF MESHES BASED ON SHORTEST PATH TREE VS. STEINER TREE FOR MULTICAST ROUTING IN MOBILE AD HOC NETWORKS

CONSTRUCTION AND EVALUATION OF MESHES BASED ON SHORTEST PATH TREE VS. STEINER TREE FOR MULTICAST ROUTING IN MOBILE AD HOC NETWORKS CONSTRUCTION AND EVALUATION OF MESHES BASED ON SHORTEST PATH TREE VS. STEINER TREE FOR MULTICAST ROUTING IN MOBILE AD HOC NETWORKS 1 JAMES SIMS, 2 NATARAJAN MEGHANATHAN 1 Undergrad Student, Department

More information

Clustering Using Graph Connectivity

Clustering Using Graph Connectivity Clustering Using Graph Connectivity Patrick Williams June 3, 010 1 Introduction It is often desirable to group elements of a set into disjoint subsets, based on the similarity between the elements in the

More information

Bi-Objective Optimization for Scheduling in Heterogeneous Computing Systems

Bi-Objective Optimization for Scheduling in Heterogeneous Computing Systems Bi-Objective Optimization for Scheduling in Heterogeneous Computing Systems Tony Maciejewski, Kyle Tarplee, Ryan Friese, and Howard Jay Siegel Department of Electrical and Computer Engineering Colorado

More information

r (1,1) r (2,4) r (2,5) r(2,6) r (1,3) r (1,2)

r (1,1) r (2,4) r (2,5) r(2,6) r (1,3) r (1,2) Routing on Trees via Matchings? Alan Roberts 1, Antonis Symvonis 1 and Louxin Zhang 2 1 Department of Computer Science, University of Sydney, N.S.W. 2006, Australia 2 Department of Computer Science, University

More information

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Seminar on A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Mohammad Iftakher Uddin & Mohammad Mahfuzur Rahman Matrikel Nr: 9003357 Matrikel Nr : 9003358 Masters of

More information

Richard E. Korf. June 27, Abstract. divide them into two subsets, so that the sum of the numbers in

Richard E. Korf. June 27, Abstract. divide them into two subsets, so that the sum of the numbers in A Complete Anytime Algorithm for Number Partitioning Richard E. Korf Computer Science Department University of California, Los Angeles Los Angeles, Ca. 90095 korf@cs.ucla.edu June 27, 1997 Abstract Given

More information

INDIAN STATISTICAL INSTITUTE

INDIAN STATISTICAL INSTITUTE INDIAN STATISTICAL INSTITUTE Mid Semestral Examination M. Tech (CS) - II Year, 202-203 (Semester - IV) Topics in Algorithms and Complexity Date : 28.02.203 Maximum Marks : 60 Duration : 2.5 Hours Note:

More information

FUTURE communication networks are expected to support

FUTURE communication networks are expected to support 1146 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL 13, NO 5, OCTOBER 2005 A Scalable Approach to the Partition of QoS Requirements in Unicast and Multicast Ariel Orda, Senior Member, IEEE, and Alexander Sprintson,

More information

THE IMPLEMENTATION OF A DISTRIBUTED FILE SYSTEM SUPPORTING THE PARALLEL WORLD MODEL. Jun Sun, Yasushi Shinjo and Kozo Itano

THE IMPLEMENTATION OF A DISTRIBUTED FILE SYSTEM SUPPORTING THE PARALLEL WORLD MODEL. Jun Sun, Yasushi Shinjo and Kozo Itano THE IMPLEMENTATION OF A DISTRIBUTED FILE SYSTEM SUPPORTING THE PARALLEL WORLD MODEL Jun Sun, Yasushi Shinjo and Kozo Itano Institute of Information Sciences and Electronics University of Tsukuba Tsukuba,

More information

Scheduling on clusters and grids

Scheduling on clusters and grids Some basics on scheduling theory Grégory Mounié, Yves Robert et Denis Trystram ID-IMAG 6 mars 2006 Some basics on scheduling theory 1 Some basics on scheduling theory Notations and Definitions List scheduling

More information

Proposed running head: Minimum Color Sum of Bipartite Graphs. Contact Author: Prof. Amotz Bar-Noy, Address: Faculty of Engineering, Tel Aviv Universit

Proposed running head: Minimum Color Sum of Bipartite Graphs. Contact Author: Prof. Amotz Bar-Noy, Address: Faculty of Engineering, Tel Aviv Universit Minimum Color Sum of Bipartite Graphs Amotz Bar-Noy Department of Electrical Engineering, Tel-Aviv University, Tel-Aviv 69978, Israel. E-mail: amotz@eng.tau.ac.il. Guy Kortsarz Department of Computer Science,

More information

A Task Scheduling Method for Data Intensive Jobs in Multicore Distributed System

A Task Scheduling Method for Data Intensive Jobs in Multicore Distributed System 第一工業大学研究報告第 27 号 (2015)pp.13-17 13 A Task Scheduling Method for Data Intensive Jobs in Multicore Distributed System Kazuo Hajikano* 1 Hidehiro Kanemitsu* 2 Moo Wan Kim* 3 *1 Department of Information Technology

More information

Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS

Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS Structure Page Nos. 2.0 Introduction 4 2. Objectives 5 2.2 Metrics for Performance Evaluation 5 2.2. Running Time 2.2.2 Speed Up 2.2.3 Efficiency 2.3 Factors

More information

X(1) X. X(k) DFF PI1 FF PI2 PI3 PI1 FF PI2 PI3

X(1) X. X(k) DFF PI1 FF PI2 PI3 PI1 FF PI2 PI3 Partial Scan Design Methods Based on Internally Balanced Structure Tomoya TAKASAKI Tomoo INOUE Hideo FUJIWARA Graduate School of Information Science, Nara Institute of Science and Technology 8916-5 Takayama-cho,

More information

Assign auniquecodeto each state to produce a. Given jsj states, needed at least dlog jsje state bits. (minimum width encoding), at most jsj state bits

Assign auniquecodeto each state to produce a. Given jsj states, needed at least dlog jsje state bits. (minimum width encoding), at most jsj state bits State Assignment The problem: Assign auniquecodeto each state to produce a logic level description. Given jsj states, needed at least dlog jsje state bits (minimum width encoding), at most jsj state bits

More information

Architecture-Dependent Tuning of the Parameterized Communication Model for Optimal Multicasting

Architecture-Dependent Tuning of the Parameterized Communication Model for Optimal Multicasting Architecture-Dependent Tuning of the Parameterized Communication Model for Optimal Multicasting Natawut Nupairoj and Lionel M. Ni Department of Computer Science Michigan State University East Lansing,

More information

APPLICATION OF THE FUZZY MIN-MAX NEURAL NETWORK CLASSIFIER TO PROBLEMS WITH CONTINUOUS AND DISCRETE ATTRIBUTES

APPLICATION OF THE FUZZY MIN-MAX NEURAL NETWORK CLASSIFIER TO PROBLEMS WITH CONTINUOUS AND DISCRETE ATTRIBUTES APPLICATION OF THE FUZZY MIN-MAX NEURAL NETWORK CLASSIFIER TO PROBLEMS WITH CONTINUOUS AND DISCRETE ATTRIBUTES A. Likas, K. Blekas and A. Stafylopatis National Technical University of Athens Department

More information

The Postal Network: A Versatile Interconnection Topology

The Postal Network: A Versatile Interconnection Topology The Postal Network: A Versatile Interconnection Topology Jie Wu Yuanyuan Yang Dept. of Computer Sci. and Eng. Dept. of Computer Science Florida Atlantic University University of Vermont Boca Raton, FL

More information

A 3. CLASSIFICATION OF STATIC

A 3. CLASSIFICATION OF STATIC Scheduling Problems For Parallel And Distributed Systems Olga Rusanova and Alexandr Korochkin National Technical University of Ukraine Kiev Polytechnical Institute Prospect Peremogy 37, 252056, Kiev, Ukraine

More information

Effective Use of Computational Resources in Multicore Distributed Systems

Effective Use of Computational Resources in Multicore Distributed Systems Effective Use of Computational esources in Multicore Distributed Systems Hidehiro Kanemitsu Media Network Center, Waseda University Email: kanemih@ruri.waseda.jp Hidenori Nakazato Masaki Hanada Department

More information

Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc. Abstract. Direct Volume Rendering (DVR) is a powerful technique for

Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc. Abstract. Direct Volume Rendering (DVR) is a powerful technique for Comparison of Two Image-Space Subdivision Algorithms for Direct Volume Rendering on Distributed-Memory Multicomputers Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc Dept. of Computer Eng. and

More information

Wei Shu and Min-You Wu. Abstract. partitioning patterns, and communication optimization to achieve a speedup.

Wei Shu and Min-You Wu. Abstract. partitioning patterns, and communication optimization to achieve a speedup. Sparse Implementation of Revised Simplex Algorithms on Parallel Computers Wei Shu and Min-You Wu Abstract Parallelizing sparse simplex algorithms is one of the most challenging problems. Because of very

More information

time using O( n log n ) processors on the EREW PRAM. Thus, our algorithm improves on the previous results, either in time complexity or in the model o

time using O( n log n ) processors on the EREW PRAM. Thus, our algorithm improves on the previous results, either in time complexity or in the model o Reconstructing a Binary Tree from its Traversals in Doubly-Logarithmic CREW Time Stephan Olariu Michael Overstreet Department of Computer Science, Old Dominion University, Norfolk, VA 23529 Zhaofang Wen

More information

System-Level Synthesis of Application Specific Systems using A* Search and Generalized Force-Directed Heuristics

System-Level Synthesis of Application Specific Systems using A* Search and Generalized Force-Directed Heuristics System-Level Synthesis of Application Specific Systems using A* Search and Generalized Force-Directed Heuristics Chunho Lee, Miodrag Potkonjak, and Wayne Wolf Computer Science Department, University of

More information

18.3 Deleting a key from a B-tree

18.3 Deleting a key from a B-tree 18.3 Deleting a key from a B-tree B-TREE-DELETE deletes the key from the subtree rooted at We design it to guarantee that whenever it calls itself recursively on a node, the number of keys in is at least

More information

Minimum Makespan Assembly Plans. Giorgio Gallo and Maria Grazia Scutella. Dipartimento di Informatica, Universita di Pisa,

Minimum Makespan Assembly Plans. Giorgio Gallo and Maria Grazia Scutella. Dipartimento di Informatica, Universita di Pisa, Universita di Pisa Dipartimento di Informatica Technical Report : TR-98-0 Minimum Makespan Assembly Plans Giorgio Gallo, Maria Grazia Scutella' September, 998 ADDR: Corso Italia 40,565 Pisa,Italy. TEL:

More information

Partitioning Multimedia Objects for. Optimal Allocation in Distributed. Computing Systems. Kingsley C. Nwosu. IBM Corporation,

Partitioning Multimedia Objects for. Optimal Allocation in Distributed. Computing Systems. Kingsley C. Nwosu. IBM Corporation, Partitioning Multimedia Objects for Optimal Allocation in Distributed Computing Systems Kingsley C. Nwosu IBM Corporation, POWER Parallel Systems Development, Large Scale Computing Division, MS/992, Kingston,

More information

Static Multiprocessor Scheduling of Periodic Real-Time Tasks with Precedence Constraints and Communication Costs

Static Multiprocessor Scheduling of Periodic Real-Time Tasks with Precedence Constraints and Communication Costs Static Multiprocessor Scheduling of Periodic Real-Time Tasks with Precedence Constraints and Communication Costs Stefan Riinngren and Behrooz A. Shirazi Department of Computer Science and Engineering The

More information

Creating Meaningful Training Data for Dicult Job Shop Scheduling Instances for Ordinal Regression

Creating Meaningful Training Data for Dicult Job Shop Scheduling Instances for Ordinal Regression Creating Meaningful Training Data for Dicult Job Shop Scheduling Instances for Ordinal Regression Helga Ingimundardóttir University of Iceland March 28 th, 2012 Outline Introduction Job Shop Scheduling

More information

Test Set Compaction Algorithms for Combinational Circuits

Test Set Compaction Algorithms for Combinational Circuits Proceedings of the International Conference on Computer-Aided Design, November 1998 Set Compaction Algorithms for Combinational Circuits Ilker Hamzaoglu and Janak H. Patel Center for Reliable & High-Performance

More information

Routing Multicast Streams in Clos Networks

Routing Multicast Streams in Clos Networks Routing Multicast Streams in Clos Networks De-Ron Liang Institute of Information Science Academia Sinica Taipei Taiwan59 R.O.C. drliang@iis.sinica.edu.tw Chen-Liang Fang Jin-Wen College of Business and

More information

CS473-Algorithms I. Lecture 13-A. Graphs. Cevdet Aykanat - Bilkent University Computer Engineering Department

CS473-Algorithms I. Lecture 13-A. Graphs. Cevdet Aykanat - Bilkent University Computer Engineering Department CS473-Algorithms I Lecture 3-A Graphs Graphs A directed graph (or digraph) G is a pair (V, E), where V is a finite set, and E is a binary relation on V The set V: Vertex set of G The set E: Edge set of

More information

An ATM Network Planning Model. A. Farago, V.T. Hai, T. Cinkler, Z. Fekete, A. Arato. Dept. of Telecommunications and Telematics

An ATM Network Planning Model. A. Farago, V.T. Hai, T. Cinkler, Z. Fekete, A. Arato. Dept. of Telecommunications and Telematics An ATM Network Planning Model A. Farago, V.T. Hai, T. Cinkler, Z. Fekete, A. Arato Dept. of Telecommunications and Telematics Technical University of Budapest XI. Stoczek u. 2, Budapest, Hungary H-1111

More information

A static mapping heuristics to map parallel applications to heterogeneous computing systems

A static mapping heuristics to map parallel applications to heterogeneous computing systems CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2005; 17:1579 1605 Published online 24 June 2005 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cpe.902

More information

Henning Koch. Dept. of Computer Science. University of Darmstadt. Alexanderstr. 10. D Darmstadt. Germany. Keywords:

Henning Koch. Dept. of Computer Science. University of Darmstadt. Alexanderstr. 10. D Darmstadt. Germany. Keywords: Embedding Protocols for Scalable Replication Management 1 Henning Koch Dept. of Computer Science University of Darmstadt Alexanderstr. 10 D-64283 Darmstadt Germany koch@isa.informatik.th-darmstadt.de Keywords:

More information

Introduction to Parallel & Distributed Computing Parallel Graph Algorithms

Introduction to Parallel & Distributed Computing Parallel Graph Algorithms Introduction to Parallel & Distributed Computing Parallel Graph Algorithms Lecture 16, Spring 2014 Instructor: 罗国杰 gluo@pku.edu.cn In This Lecture Parallel formulations of some important and fundamental

More information

V Advanced Data Structures

V Advanced Data Structures V Advanced Data Structures B-Trees Fibonacci Heaps 18 B-Trees B-trees are similar to RBTs, but they are better at minimizing disk I/O operations Many database systems use B-trees, or variants of them,

More information

International Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERA

International Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERA International Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERATIONS KAM_IL SARAC, OMER E GEC_IO GLU, AMR EL ABBADI

More information

Static Scheduling Algorithms for Allocating Directed Task Graphs to Multiprocessors

Static Scheduling Algorithms for Allocating Directed Task Graphs to Multiprocessors Static Scheduling Algorithms for Allocating Directed Task Graphs to Multiprocessors YU-KWONG KWOK The University of Hong Kong AND ISHFAQ AHMAD The Hong Kong University of Science and Technology Static

More information