Mapping of Parallel Tasks to Multiprocessors with Duplication *

Size: px
Start display at page:

Download "Mapping of Parallel Tasks to Multiprocessors with Duplication *"

Transcription

1 Mapping of Parallel Tasks to Multiprocessors with Duplication * Gyung-Leen Park Dept. of Comp. Sc. and Eng. Univ. of Texas at Arlington Arlington, TX gpark@cse.uta.edu Behrooz Shirazi Dept. of Comp. Sc. and Eng. Univ. of Texas at Arlington Arlington, TX shirazi@cse.uta.edu Jeff Marquis Prism Parallel Tech., Inc N. Plano Rd. Richardson, Texas Abstract Duplication Based Scheduling (DBS) is a relatively new approach for solving multiprocessor scheduling problems. The problem is defined as finding an optimal schedule which minimizes the parallel execution time of an application on a target system. This paper proposes a new DBS algorithm which achieves considerable performance improvement over existing DBS algorithms with equal or less time complexity. The proposed algorithm obtains a comparable performance to DBS algorithms with higher complexities. The paper also proposes a variation of the proposed algorithm which adjusts the extent of duplications according to the limited number of processors available in the target system. Our simulation study reveals the gradual performance degradation of the proposed algorithm as the number of processors available in the system is decreased. 1. Introduction Efficient scheduling of parallel programs, represented as a Directed Acyclic Graph (DAG), onto processing elements of parallel and distributed computer systems are extremely difficult and important issues [10, 22-26, 29, 31]. The goals of the scheduling process are to efficiently utilize resources and to achieve performance objectives of the application (e.g., to minimize program parallel execution time). It has been shown that the multiprocessor scheduling problem is NP-complete in general forms. The typical approach to the problem is list scheduling algorithms where tasks are put into lists according to priorities assigned by heuristics [1, 19]. Duplication Based Scheduling is a relatively new approach to the scheduling problem. The DBS algorithms are capable of reducing communication overhead by duplicating remote parent tasks on local processing elements. Since DBS methods have been also shown to be NP-complete [17] in general forms, many of the proposed DBS algorithms are based on heuristics. This paper classifies DBS algorithms into two categories according to the task duplication approach used: Scheduling with Partial Duplication (SPD) and Scheduling with Full Duplication (SFD). SPD algorithms do not duplicate the parent of a join node unless the parent is critical. A join node is defined as a node with an in-degree greater than one (i.e., a node with more than one incoming edge). Instead, they try to find the primary iparent which is defined later in this paper as an immediate parent which gives the largest start time to the join node. The join node is scheduled on the processor where the primary iparent has been scheduled. Because of the limited task duplication, algorithms in this category have a low complexity but may not be appropriate for systems with high communication overhead. They typically provide good schedules for an input DAG where computation cost is strictly larger than communication cost. Critical Path Method (CPM) [5], Search and Duplication Based Scheduling (SDBS) [6], and Scalable Task Duplication Scheduling (STDS) [8] belong to this category. SFD algorithms attempt to duplicate all the parents of a join node and apply the task duplication algorithm to all the processors that have any of the parents of the join node. Thus, algorithms in this category have higher complexity but typically show better performance than SPD algorithms. Duplication Scheduling Heuristic (DSH) [13], Bottom-up Top-down Duplication Heuristic (BTDH) [4], Linear Clustering with Task Duplication (LCTD) [3, 27], Critical Path Fast Duplication (CPFD) [2] and Economical Critical Path Fast Duplication (ECPFD) [14] belong to this category. A trade-off exists between algorithms in these two categories: performance (better application parallel execution time) versus time complexity (longer time to * This work has in part been supported by grants from NSF(CDA and MIPS ) and state of Texas ATP

2 carry out the scheduling algorithm itself). This paper proposes a new DBS algorithm that attempts to achieve the performance of SFD algorithms with a time complexity approaching SPD algorithms. The proposed algorithm, called Duplication First and Reduction Next (DFRN), duplicates the parents of any join node as done in SFD algorithms but with reduced computational complexity. In general, most of the DBS algorithms, including DFRN, assume availability of an unlimited number of processors; with few exceptions such as STDS [8] and ECPFD [14]. Since this assumption may not hold in practice, we also propose a variation of DFRN algorithm, Scalable scheduling with DFRN (SDFRN), which adjusts the extent of duplications according to the limited number of processors available in the target system. The SDFRN algorithm with N available processors provides the same schedule as that obtained by the algorithm with unbounded number of processors, where N is the number of task nodes in the input DAG. Our simulation study shows that DFRN algorithm achieves considerable performance improvement over existing algorithms with equal or less time complexity while it obtains comparable performance to algorithms which have higher time complexities. It is also shown that the performance improvement becomes greater as Communication to Computation Ratio is increased. The simulation study also reveals the graceful performance degradation of SDFRN algorithm according to the number of processors available. The remainder of this paper is organized as follows. Section 2 presents the system model and the problem definition. Section 3 briefly covers the related works. The two proposed DBS algorithms, DFRN and SDFRN, are presented in Section 4. The performance of the DFRN algorithm is compared with those of the existing algorithms in Section 5. Section 5 also shows the effect of the number of processors available on SDFRN algorithm. Finally, Section 6 concludes this paper. 2. System model and problem definition A parallel program is usually represented by a Directed Acyclic Graph (DAG), which is also called a task graph. As defined in [6], a DAG consists of a tuple (V, E, T, C), where V, E, T, and C are the set of task nodes, the set of communication edges, the set of computation costs associated with the task nodes, and the set of communication costs associated with the edges, respectively. T(V i ) is a computation cost for task V i and C(V i, V j ) is the communication cost for edge E(V i, V j ) which connects task V i and V j. The edge E(V i, V j ) represents the precedence constraint between the node V i and V j. In other words, task V j can start the execution only after the output of V i is available to V j. When the two tasks, V i and V j, are assigned to the same processor, C(V i, V j ) is assumed to be zero since intra-processor communication cost is negligible compared with the interprocessor communication cost. The weights associated with nodes and edges are obtained by estimation [30]. This paper defines two relations for precedence constraints. The V i V j relation indicates the strong precedence relation between V i and V j. That is, V i is an immediate parent of V j and V j is an immediate child of V i. The terms iparent and ichild are used to represent immediate parent and immediate child, respectively. The V i V j relation indicates the weak precedence relation between V i and V j. That is, V i is a parent of V j but not necessarily the immediate one. V i V j and V j V k imply V i V k. V i V j and V j V k do not imply V i V k, but imply V i V k. The relation is transitive, and the relation is not. A node without any parent is called an entry node and a node without any child is called an exit node. Graphically, a node is represented as a circle with a dividing line in the middle. The number in the upper portion of the circle represents the node ID number and the number in the lower portion of the circle represents the computation cost for the node. For example, for the sample DAG in Figure 1, the entry node is V 1 which has a computation cost of 10. In the graph representation of a DAG, the communication cost for each edge is written on the edge itself. For each node, in-degree is the number of input edges and out-degree is the number of output edges. For example, in Figure 1, the incoming and outdegrees for the node V 5 are 3 and 1, respectively. A few terms are defined here for a more clear presentation. Definition 1: A node is called a fork node if its outdegree is greater than 1. Definition 2: A node is called a join node if its in-degree is greater than 1. Note that the fork node and the join node are not exclusive terms, which means that one node can be both a fork and also a join node; i.e., both of the node s indegree and out-degree are greater than one. Similarly, a node can be neither a fork nor a join node; i.e., both of the node s in-degree and out-degree are one. In the task graph of Figure 1, nodes V 1, V 2, V 3, and V 4 are fork nodes while nodes V 5, V 6, V 7, and V 8 are join nodes. In the example, all the fork nodes have out-degree three and all the join nodes have in-degree three.

3 Figure 1. Sample DAG Definition 3: The Earliest Start Time, EST(V i, P k ), and Earliest Completion Time, ECT(V i, P k ), are the times that a task V i starts and finishes its execution on processor P k, respectively. Definition 4: A message arriving time (MAT) from V i to V j, or MAT(V i, V j ), is the time that the message from V i arrives at V j. If V i and V j are scheduled on the same processor P k, MAT(V i, V j ) becomes ECT(V i, P k ). Otherwise, MAT(V i, V j ) = ECT(V i, P k ) + C(V i, V j ). Definition 5: An iparent of a join node is called its primary iparent if it provides the largest MAT to the join node. The primary iparent is denoted as V i = PIP(V j ) if V i is the primary iparent of V j. More formally, V i = PIP(V j ) if and only if MAT(V i, V j ) > MAT(V k, V j ), for all V k, V k V j, V i V j, i k. If there are more than one iparent providing the same largest MAT, PIP is chosen arbitrary. Definition 6: An immediate parent node of a join node is called the secondary iparent of the join node if it provides the second largest MAT to the join node. The secondary iparent is denoted as V i = SIP(V j ) if V i is the secondary iparent of V j. Formally, V i = SIP(V j ) if and only if MAT(V i, V j ) > MAT(V k, V j ), for all V k, V k V j, V k PIP(V j ), V i V j, i k. If there are more than one iparent providing the same second largest MAT, SIP is chosen arbitrary. EST(V j, P c ) becomes Max(ECT(PIP(V j ), P c ), MAT(SIP(V j ), V j )) if V j is scheduled without any task duplication on P c where PIP(V j ) has been scheduled. Definition 7: The processor having the primary iparent for V i is called the primary processor of V i. Definition 8: The critical path of a task graph is the path from an entry node to an exit node which has the largest sum of computation and communication costs of the nodes and edges on the path. The Critical Path Including Communication cost (CPIC) is the length of the critical path including communication costs in the path while the Critical Path Excluding Communication cost (CPEC) is the length of the critical path excluding communication costs in the path. For example, the critical path of the sample graph in Figure 1 consists of node V 1, V 4, V 7, and V 8. Then CPIC is T(V 1 ) + C(V 1, V 4 ) + T(V 4 ) + C(V 4, V 7 ) + T(V 7 ) + C(V 7, V 8 ) + T(V 8 ), which is 400. CPEC is T(V 1 ) + T(V 4 ) + T(V 7 ) + T(V 8 ), which is 150. Definition 9: The level of a node is recursively defined as follows. The level of an entry node, V 0, is zero. Let Lv(V i ) be the level of V i. Then Lv(V 0 ) = 0. Lv(V j ) = Lv(V i ) + 1, V i V j, for non-join node V j. Lv(V j ) = Max(Lv(V i )) + 1, V i V j, for join node V j. For example, the level of node V 1, V 2, V 5, V 8 are 0, 1, 2, and 3, respectively. Even though we assume that there is an edge from node 1 to 5, the level of node 5 is still 2 not 1, since Lv(V 5 ) = Max(Lv(V i )) + 1, V i V 5, for join node V 5. Similar to existing DBS algorithms, the number of processors are assumed to be unbounded. This assumption is released in this paper when we present the scalable version of DBS algorithm in Section 4.2. The topology of the target system is also assumed to be a complete graph; i.e., all processors can directly communicate with each other. This assumption may be justified by noting that the distance between processors is no longer an important factor to the communication delays with current technologies such as worm whole routing. Thus, the multiprocessor scheduling process becomes a mapping of the task nodes in the input DAG to the processors in the target system with the goal of minimizing the execution time of the entire program. The execution time of the entire program after scheduling is called the parallel time to be distinguished from the completion time of an individual task node. 3. Related work This section briefly covers several typical scheduling algorithms in the literature. They are used later in this paper for performance comparison.

4 3.1. Heavy Node First (HNF) algorithm The HNF algorithm [21] assigns the nodes in a DAG to processors level by level. At each level, the scheduler selects the eligible nodes for scheduling in descending order based on computational weight with the heaviest node (i.e. the node which has the largest computation cost) selected first. The node is selected arbitrarily if multiple nodes at the same level have the same computation cost. The selected node is assigned to a processor which gives the earliest start time to the node Linear Clustering (LC) algorithm The LC algorithm [12] is a traditional critical path based clustering method. The scheduler identifies the critical path, removes the nodes in the path from the DAG, and assigns the nodes in the path into a linear cluster. The process is repeated until there is no task node remaining in the DAG. Each cluster is then scheduled onto a processor Scalable Task Duplication based Scheduling (STDS) algorithm The STDS [8] algorithm first calculates the start time and the completion time of each node by traversing the input DAG. The algorithm then generates clusters by performing depth first search starting from the exit node. While performing the task assignment process, only critical tasks which are essential to establish a path from a particular node to the entry node are duplicated. The algorithm has a small complexity because of the limited duplication. If the number of processors available is less than that needed, the algorithm executes the processor reduction procedure. In this paper, unbounded number of processors are used for STDS for performance comparison in Section Critical Path Fast Duplication (CPFD) algorithm The CPFD algorithm [2] classifies the nodes in a DAG into three categories: Critical Path Node (CPN), In- Branch Node (IBN), and Out-Branch Node (OBN). A CPN is a node on the critical path. An IBN is a node from which there is a path to a CPN. An OBN is a node which is neither a CPN nor an IBN. CPFD tries to schedule CPNs first. If there is any unscheduled IBN for a CPN, CPFD traces the IBN and schedules it first. OBNs are scheduled after all the IBNs and CPNs have been scheduled. The motivation behind CPFD is that the parallel time will be likely to be reduced by trying to schedule CPNs first. Performance comparisons shows that CPFD outperforms DSH and BTDH in most cases [2] Comparison We have classified the existing DBS algorithms into two categories: SPD (Scheduling with Partial Duplication) and SFD (Scheduling with Full Duplication). Both SPD and SFD approaches duplicate a fork node if the ichild of the fork node is not a join node. On the other hand, the SPD approach does not duplicate any parent except PIP for a join node while the SFD approach tries to duplicate all the parents. Naturally, there exists a trade-off between better performance (smaller parallel time of the application, typically achieved by SFD algorithms) and better running time (smaller time to carry out the scheduling process itself, typically achieved by SPD algorithms) between the two approaches to the duplication based scheduling. Our goal is to introduce a new task duplication scheduling algorithm with a performance better than, but a running time comparable to, the SPD algorithms. Table I summarizes the time complexity of these algorithms and indicates the class of algorithms they belong to (i.e., whether they are SPD or SFD algorithms). Note that, for a DAG with V nodes, all the SFD algorithms have a complexity of O(V 4 ) while the SPD algorithms have a complexity of O(V 2 ). Table I. Comparison of scheduling algorithms SCHEDULERS CLASSIFICATION COMPLEXITY HNF NON-DBS O(VLOGV) LC NON-DBS O(V 3 ) DSH SFD O(V 4 ) BTDH SFD O(V 4 ) CPM SPD O(V 2 ) SDBS SPD O(V 2 ) STDS SPD O(V 2 ) LCTD SFD O(V 4 ) CPFD SFD O(V 4 ) ECPFD SFD O(V 4 ) As an illustration, Figure 2 presents the schedules obtained by each algorithm for the sample DAG of Figure 1. In this example, P i represents processing element i: PT is the Parallel Time of the DAG; and [EST(V i, P k ), i, ECT(V i, P k )] represents the earliest starting time and earliest completion time of task i. For example, in Figure 2.(a), task V 1 starts its execution at time 0 and finishes at time 10 on processor P 1.

5 P1: [0, 1, 10][10, 4, 70][190, 7, 260][260, 8, 270] P2: [60, 3, 90][170, 6, 230] P3: [60, 2, 80][160, 5, 210] (a) Schedule by HNF (PT = 270) P1: [0, 1, 10][10, 4, 70][140, 7, 210][210, 8, 220] P2: [0, 1, 10][10, 3, 40] P3: [0, 1, 10][10, 2, 30] P4: [0, 1, 10][10, 4, 70][100, 6, 160] P5: [0, 1, 10][10, 4, 70][110, 5, 160] (b) Schedule by STDS (PT = 220) P1: [0, 1, 10][10, 4, 70][190, 7, 260][260, 8, 270] P2: [60, 3, 90][120, 5, 170] P3: [60, 2, 80][170, 6, 230] (c) Schedule by LC (PT = 270) P1: [0,1,10][10,4,70][70,3,100][110,7,180][180,8,190] P2: [0, 1, 10][10, 3, 40] P3: [0, 1, 10][10, 2, 30] P4: [0, 1, 10][10, 4, 70][70, 3, 100][100, 6, 160] P5: [0, 1, 10][10, 4, 70][70, 3, 100][100, 5, 150] (d) Schedule by DFRN (PT = 190) P1: [0, 1, 10][10, 4, 70][70, 3, 100][100, 5, 150] P2: [0,1,10][10,3,40][40,4,100][110,7,180][180,8,190] P3: [0, 1, 10][10, 2, 30][30, 4, 90][100, 6, 160] (e) Schedule by CPFD (PT = 190) Figure 2. Schedules by various schedulers 4. The proposed algorithms This section presents the two proposed algorithms, DFRN and SDFRN. The motivations and the high level descriptions of DFRN and SDFRN algorithms are shown in Section 4.1 and Section 4.2, respectively. The worst case performance and the optimality analysis are also provided in this section DFRN Motivation. When we ran existing scheduling algorithms for a DAG with about 400 nodes, we observed that a SFD algorithm takes about 50 minutes to generate a schedule while a SPD algorithm takes less than one second. We need a scheduler with a performance better than SPD algorithms, but with running time adequate for applications consisting of large number of tasks. This need became our goal, and the goal was achieved by employing a new task duplication approach called DFRN (Duplication First and Reduction Next). DFRN approach behaves the same as SPD and SFD approaches in handling fork nodes but differently in handling join nodes. A SFD algorithm recursively estimates the effect of a possible duplication and decides whether to duplicate each node one by one. As a consequence, for a DAG with V nodes, each node may be considered V times for duplication in the worst case. Unlike the SFD approach, DFRN first duplicates all parent nodes in a bottom-up fashion to the parent which has been scheduled on the same processor, without estimating the effect of their duplications. Then each duplicated task is removed if the task does not meet certain conditions. Also, SFD algorithms are applied to all the processors on which any iparent of the join node has been scheduled. We observed that the completion time of the join node on the primary processor was shorter than those on other processors after the duplication process in most cases. Thus, DFRN applies the duplication only for the primary processor with the hope that the primary processor is the best candidate for the join node. Those two differences provide, incomparably shorter running time than, but comparable performance to, SFD algorithms as shown in Section 5. On the other hand, DFRN approach also achieves considerable performance improvement over SPD approaches Description of the proposed algorithm. The high level description of DFRN algorithm is shown in Figure 3. In this figure, the notations, P c, P u, PIP, IP, LN, and JN are used for the primary processor, an unused processor, the primary iparent, iparent, the last node, and a join node, respectively. In addition, a new term used in the algorithm is defined first. Definition 10: At any step of the scheduling process, the last node of processor P i is the most recent node assigned to P i. In Figure 2.(a), the last node of P 1, P 2, and P 3 are V 8, V 6, and V 5, respectively. The term, iparent, used in the algorithm in Figure 3 indicates the iparent which has the minimum EST if there are more than one iparent image across different processors. For example, in Figure 2. (d), V 3 on P 2 is identified as the iparent of its ichild since EST(V 3, P 2 ) = 10 while EST(V 3, P k ) = 70 for k = 1, 4, and 5. The primary iparent and the primary processor are used in the same way. For example, V 3 on P 2 is identified as the primary iparent if V 3 is the primary iparent of any node. The primary processor is P 2 in this case. Note that the algorithm is presented in a generic form so that we can use any list scheduling algorithm as a node selection algorithm. The node selection algorithm

6 decides which node is considered first. HNF is used as the node selection algorithm in this paper. In step (1), initialize() reads the input DAG and identifies the level of each node. All the nodes in the same level are sorted in descending order of their computation costs (as per the HNF heuristic). Step (2) considers each node according to the priority given in step (1). The node under consideration, V i, can be either a join node or not. Steps (3) through (10) handle non-join nodes. The iparent in step (4) may or may not be the last node. If the iparent is the last node, V i is scheduled after the iparent as shown in step (6). If the iparent is not the last node, tasks scheduled on the processor up to the iparent are copied to (i.e. duplicated) an unused processor as shown in step (8). Then V i is scheduled onto the unused processor to make EST of V i the same as ECT of the iparent in step (9). Otherwise, EST of V i is increased due to the computation time of the tasks between the iparent and the last node in the schedule. If V i is a join node, the primary iparent of V i and the primary processor are identified in step (12). DFRN is applied to a join node in steps (14) or (17) after handling the last node in the same way. DFRN(P a,v i ) consists of two procedures, try_duplication(p a,v i ) and try_deletion(p a,v i ), as shown in steps (21) and (22). try_duplication(p c,v i ) first tries to duplicate the iparent giving the largest MAT to V i. The procedure recursively searches its iparent from V i in a bottom-up fashion until it finds the parent which has already been scheduled on P a as shown in step (24) and (25). When it finds the parent on P a, it stops the search and duplicates the parents searched so far as shown in step (27). As a result, V i is duplicated before V j, when V i V j, and try_deletion(p a,v i ) considers each duplicated node one by one in the same sequence. After the duplication step, try_deletion(p a,v i ) decides whether to delete any of the duplicated tasks based on the two conditions in step (30). The first condition is for the case when the output of the duplicated task is available earlier by a message from the task on another processor than the duplicated task itself; the duplicated task is deleted since the duplication is not necessary. The second condition is for the case when the duplication does not decrease EST(V i, P c ) any more. By the second condition, EST of any node obtained by the DFRN algorithm is guaranteed to be less than or equal to EST of the same node obtained by SPD algorithms since the second condition results in EST(V i, P c ) MAT(SIP(V i ), V i ) while EST(V i, P c ) = MAT(SIP(V i ), V i ) in SPD approach assuming ECT(PIP(V i ), P c ) MAT(SIP(V i ), V i ). The parallel time obtained from DFRN is also less than or equal to that from a SPD algorithm since the parallel time is the largest ECT of all the nodes in a DAG. Scheduling algorithm with DFRN (1) initialize() // build a priority queue using HNF (2) for each node V i in the queue // in FIFO manner (3) if V i is not a JN // V i has only one IP (4) identify the IP (5) if the IP is LN (6) schedule V i to the PE having the IP (7) else // if the IP is not LN (8) copy the schedule up to the IP onto P u // now the IP is LN in P u (9) schedule V i to P u. (10) endif (11) else // if V i is a join node (12) identify PIP and P c (13) if PIP is LN (14) DFRN (P c,v i ) // apply DFRN to P c (15) else // if PIP is not LN (16) copy the schedule up to PIP onto P u (17) DFRN (P u,v i )// apply DFRN to P u (18) endif (19) endif (20) end for DFRN(P a, V i ) (21) try_duplication(p a, V i ) (22) try_deletion(p a, V i ) try_duplication(p a, V i ) (23) for each V p, (MAT(V p,v i ) MAT(V q,v i ),V p V i, V q V i, p q, V p and V q are not on P a yet) // from the node giving the largest MAT to the node giving the smallest MAT (24) if there is any V x, such that MAT(V x,v p ) MAT(V y,v p ), V x V p, V y V p, x y,v x and V y are not on P a yet // if any IP of V p is not scheduled on P a (25) try_duplication(p a, V x ) //traces the IP which is not on P a (26) else // if all its IPs are scheduled on P a (27) schedule V p onto P a //duplicates the IP which is not on P a (28) endif (29) end for try_deletion(p a, V i ) (30) delete any duplicated task V k if (i) ECT(V k, P a ) > MAT(V k, V d ) or // V d is the ichild of V k for which V k is duplicated (ii) ECT(V k, P a ) > MAT(SIP(V i ), V i )) Figure 3. Description of the DFRN algorithm

7 The dominant part is DFRN(P a, V i ) in the algorithm. Since try_duplication(p a, V i ) duplicates parents by the order of MAT, the sorting takes O(V 2 ), which makes the complexity of the routine O(V 2 ). try_deletion(p a, V i ) also takes O(V 2 ) time since it considers deletion m times and takes O(p) time for calculation of EST(V i, P a ) whenever any node is deleted, where m is the number of tasks duplicated and p is the number of deleted iparent of the node, m V, p V. Thus the complexity of try_deletion(p a, V i ) becomes O(V 2 ). The whole complexity becomes O(V 3 ) since DFRN(P a, V i ) is executed q times where q is the number of join nodes in the DAG, q V Analysis of the proposed algorithm. The worst case analysis of the proposed algorithm is important especially for real-time systems which are important application areas of parallel processing. The proposed algorithm has the following two properties. Due to space limitation, we omit the proof. Interested readers are referred to [18]. 1. The worst case parallel time obtained by the proposed algorithm for any input DAG is guaranteed to be less than or equal to CPIC. 2. The proposed algorithm always provides an optimal schedule for tree structured input DAGs SDFRN (Scalable scheduling with DFRN) Motivation. Most of DBS algorithms, including DFRN, assume that an unlimited number of processors are available for scheduling since this assumption makes the design of DBS algorithms simpler. Recently, scalable DBS algorithms [8, 14] are considered due to the limited number of processors in real world situations. We propose a scalable DFRN algorithm which adjusts the extent of the duplication according to the number of processors available in the target system Description of the proposed algorithm. The scalability can be achieved by adding one condition to the algorithm in Figure 3. DFRN algorithm requires an unused processor for duplications in steps (8) and (16) as shown in Figure 2. By inserting a condition which checks the availability of any unused processor in these two lines, the DFRN algorithm can adjust the extent of the duplications according to the number of processors available. In other words, SDFRN algorithm executes the duplications in lines (8) and (16) only if there is an unused processor available in the system. Since the algorithm is executed for each node, N processors are enough to guarantee the necessary duplications in the algorithm, where N is the number of task nodes in the input DAG. That is, the two properties shown in Section are still valid for SDFRN algorithm if at least N processors are available in the target system. For illustration, Figure 4 contains the schedules obtained by SDFRN algorithm with various number of processors for the sample DAG of Figure 1. In this example, when we try to use more than 5 processors, the algorithm limits the number of processors to 5. Thus, the cases with more than 5 processors are not shown in this figure. P1: [0,1,10] [10,4,70] [70,3,100] [110,7,180] [180,8,190] P2: [0,1,10] [10,3,40] P3: [0,1,10] [10,2,30] P4: [0,1,10] [10,4,70] [70,3,100] [100,6,160] P5: [0, 1, 10] [10, 4, 70] [70, 3, 100] [100, 5, 150] (a) Schedule by SDFRN with 5 processors (PT = 190) P1: [0,1,10] [10,4,70] [70,3,100] [110,7,180] [180,5,230] [230,8,240] P2: [0,1,10] [10,3,40] P2: [0,1,10] [10,2,30] P4: [0,1,10] [10,4,70] [70,3,100] [100,6,160] (b) Schedule by SDFRN with 4 processors (PT = 240) P1: [0,1,10] [10,4,70] [70,3,100] [110,7,180] [180,6,240] [240,5,290] [290,8,300] P2: [0,1,10] [10,3,40] P3: [0,1,10] [10,2,30] (c) Schedule by SDFRN with 3 processors (PT = 300) P1: [0,1,10] [10,4,70] [70,2,90] [90,3,120] [120,7,190] [190,6,250] [250,5,300] [300,8,310] P2: [0,1,10] [10,3,40] (d) Schedule by SDFRN with 2 processors (PT = 310) Figure 4. Schedules obtained by SDFRN with various number of available processors. 5. Performance comparison We generated 1000 random DAGs to compare the performance of DFRN with existing scheduling algorithms. We used two parameters the effects of which we were interested to investigate: the number of nodes and CCR (Communication to Computation Ratio). The numbers of nodes used are 20, 40, 60, 80, and 100 while CCR values used are 0.1, 0.5, 1.0, 5.0, and CCR is the ratio of average communication cost to average computation cost. 40 DAGs are generated for each case of the 25 combinations, which makes 1000 DAGs. The scheduling techniques presented in Section 2 are used for comparison.

8 For performance comparison, we define one normalized performance measure named Relative Parallel Time (RPT), which is a ratio of the parallel time to CPEC. For example, if the parallel time obtained by DFRN is 200 and CPEC is 100, RPT of DFRN is 2.0. A smaller RPT value is indicative of a shorter parallel time. The RPT of any scheduling algorithm can not be lower than one since CPEC is the lower bound. One of our objectives is to observe the trade-off between the performance (the parallel time obtained) and the running time (the time taken to generate a schedule) among the scheduling algorithms. Table II shows the actual average running time of the five algorithms. The running time is the user time obtained by time command on a Sun Sparc10 workstation. For an input DAG with 400 nodes, the time taken to get a schedule was 5.97 seconds by HNF, 0.34 seconds by STDS, 2.95 minutes by LC, 17.3 seconds by DFRN, and 46.4 minutes by CPFD. There are significant differences among the running times. Table II. Comparison of running times (in seconds) N HNF STDS LC CPFD DFRN Table III shows the result of the comparison between each pair of algorithms. Each entry of the table consists of three elements in > a, = b, < c format, which means that the algorithm in the same row provides longer parallel time a times more than, same parallel time b times as, and shorter parallel time c times more than the algorithm in the same column. For example, if we want to see the comparison between DFRN and HNF, we look up DFRN in the fifth row and HNF in the first column or vice versa. In this case, the entry is > 2, = 22, < 976, which means that DFRN provides the longer parallel time 2 times more than, same parallel time 22 times as, and shorter parallel time 976 times more than HNF for 1000 randomly generated DAGs. The comparison shows that applying DFRN to HNF shortens the parallel time in 97.6 % of the cases. Comparing DFRN with LC which has the same complexity as DFRN, DFRN generates shorter parallel time 829 times, same parallel time 171 times, and no longer parallel time while the running time of DFRN was shorter than that of LC. We also confirmed that the parallel time obtained by DFRN is always less than CPIC in the 1000 runs. On the other hand, DFRN generates shorter parallel time 27 times more than, same parallel time 685 times as, and longer parallel time 288 times more than CPFD. Note that DFRN provides the same parallel time as that obtained by CPFD in 68.5% of the cases with % of running time of CPFD, which implies the effectiveness of DFRN approach. Due to the incomparably long running time of CPFD, DFRN would be a good candidate for application programs consisting of large number of tasks. For a DAG with very large number of nodes, STDS will be appropriate because of its very short running time. Table III. Comparison of parallel times HNF STDS LC CPFD DFRN HNF = 1000 > 885 = 48 < 67 > 587 = 39 < 374 > 978 = 22 > 976 = 22 < 2 STDS > 67 = 48 < 885 LC > 374 = 39 < 587 CPFD = 22 < 978 DFRN > 2 = 22 < 976 = 1000 > 808 = 165 < 27 = 425 < 575 > 3 = 430 < 567 > 27 = 165 < 808 = 1000 = 171 < 829 = 171 < 829 > 575 = 425 > 829 = 171 = 1000 > 288 = 685 < 27 > 567 = 430 < 3 > 829 = 171 > 27 = 685 < 288 = 1000 Graphical representations of the performance comparison are shown in Figure 5 and Figure 6 with respect to N (the number of nodes) and CCR, respectively. Each case in Figure 5 is an average of 200 runs varying CCR. The average CCR value is 3.3. As shown in Figure 5, the number of nodes does not significantly affect the relative performance of scheduling algorithms. In other words, the performance comparison shows similar patterns regardless of N. In the pattern, DFRN shows much shorter parallel time than existing algorithms with equal or lower time complexity while it shows a comparable performance to CPFD. CCR is a critical parameter. As CCR is increased, the performance gap becomes larger as shown in Figure 6. The difference among 5 algorithms was negligible until CCR is one. But when CCR is 5, RPT of HNF, STDS, LC, DFRN, and CPFD become 3.38, 2.57, 3.61, 1.67, and 1.61, respectively. When CCR is 10, they are 5.79, 5.01, 7.68, 2.45, and 2.27, respectively. As expected, duplication-based scheduling algorithms show considerable performance improvement for a DAG with high CCR values.

9 RPT N HNF STDS LC DFRN CPFD RPT CCR 1.0N 0.9N 0.8N 0.7N 0.6N 0.5N 0.4N 0.3N 0.2N 0.1N Figure 5. Comparison with respect to N RPT CCR HNF STDS LC DFRN CPFD Figure 6. Comparison with respect to CCR Figure 7 and Figure 8 show the performance degradation of SDFRN algorithm according to the number of processors available. In these Figures, an represents a N available processors. For example, 0.9N indicates that the number of available processors is 90 % of the number of task nodes. These Figures show that the SDFRN achieves comparable performance to that of DFRN with unbounded number of processors until the number of available processors is reduced to 60 % of the number of task nodes, in most cases. The performance degradation becomes significant as the number of processors is decreased to less than 50 % of the number of nodes. The differences in values of N and CCR do not significantly change the pattern of the performance degradation. RPT N 1.0N 0.9N 0.8N 0.7N 0.6N 0.5N 0.4N 0.3N 0.2N 0.1N Figure 7. Performance degradation with respect to N (CCR = 3.3, D = 3.5) Figure 8. Performance degradation with respect to CCR (N = 100, D = 3.8) 6. Conclusion This paper classified existing DBS algorithms into two categories, SPD and SFD algorithms, according to the duplication method used for a join node. SFD algorithms try to duplicate all the iparents of a join node while SPD algorithms do not. As a result, a SFD algorithm outperforms a SPD algorithms while its running time is incomparably longer than that of a SPD algorithms. This paper presented a new duplicationbased scheduling algorithm (i.e., DFRN) by trying to combine good features of the two approaches. The motivation is to duplicate iparents for a join node if the duplication reduces earliest start time of the join node as done in SFD algorithms but without adding much complexity so that the new approach will be well suitable for applications consisting of large number of tasks. In general, most of the DBS algorithms, including the proposed one, assume availability of an unlimited number of processors. Since the assumption may not hold in practice, we also proposed a variation of DFRN, SDFRN, which adjust the extent of the duplications according to the limited number of processors available in real world situations. The SDFRN shows the same performance as that of DFRN if at least N processors are available in the target system. Our performance study showed that DFRN has a runtime comparable to SPD and non-duplicating scheduling algorithms, while outperforming such algorithms by generating schedules with much shorter parallel times. Compared to SFD algorithms, DFRN offers comparable performance with a run-time which is several orders of magnitude shorter. The study also reveals the graceful performance degradation of SDFRN algorithm according to the number of processors available in the system. Since the performance comparison study is done based on random DAGs, we are currently investigating the characteristics of DAGs from real applications.

10 References [1] T. L. Adam, K. Chandy, and J. Dickson, A Comparison of List Scheduling for Parallel Processing System, Communication of the ACM, vol. 17, no. 12,, Dec. 1974, pp [2] I. Ahmad and Y. K. Kwok, A New Approach to Scheduling Parallel Program Using Task Duplication, Proc. of Int l Conf. on Parallel Processing, vol. II, Aug. 1994, pp [3] H. Chen, B. Shirazi, and J. Marquis, Performance Evaluation of A Novel Scheduling Method: Linear Clustering with Task Duplication, Proc. of Int l Conf. on Parallel and Distributed Systems, Dec. 1993, pp [4] Y. C. Chung and S. Ranka, Application and Performance Analysis of a Compile-Time Optimization Approach for List Scheduling Algorithms on Distributed-Memory Multiprocessors, Proc. of Supercomputing 92, Nov. 1992, pp [5] J. Y. Colin and P. Chretienne, C.P.M. Scheduling with Small Communication Delays and Task Duplication, Operations Research, 1991, pp [6] S. Darbha and D. P. Agrawal, SDBS: A task duplication based optimal scheduling algorithm, Proc. of Scalable High Performance Computing Conf., May 1994, pp [7] S. Darbha and D. P. Agrawal, A Fast and Scalable Scheduling Algorithm for Distributed Memory Systems, Proc. of Symp. On Parallel and Distributed Processing, Oct. 1995, pp [8] S. Darbha, Task Scheduling Algorithms for Distributed Memory Systems, PhD Thesis, North Carolina State Univ., [9] S. Darbha and D. P. Agrawal, Scalable Scheduling Algorithm for Distributed Memory Machines, Proc. of Symp. On Parallel and Distributed Processing, Oct. 1996, pp [10] H. El-Rewini, T. G. Lewis, and H. H. Ali, Task Scheduling in Parallel and Distributed Systems, New York: Prentice Hall, [11] A. Gerasoulis and T. Yang, A Comparison of Clustering Heuristics for Scheduling DAG s on Multiprocessors, J. Parallel and Distributed Computing, vol. 16, no. 4, Dec. 1992, pp [12] S. J. Kim and J. C. Browne, A general approach to mapping of parallel computation upon multiprocessor architectures, Proc. of Int l Conf. on Parallel Processing, vol III, 1988, pp [13] B. Kruatrachue and T. G. Lewis, Grain Size Determination for parallel processing, IEEE Software, Jan. 1988, pp [14] Y.-K. Kwok and I. Ahmad, Exploiting Duplication to Minimize the Execution Times of Parallel Programs on Message-Passing Systems, Proc. of Symp. On Parallel and Distributed Processing, Oct. 1994, pp [15] Y.-K. Kwok, I. Ahmad, and J. Gu, FAST: A Low- Complexity Algorithm for Efficient Scheduling of DAGs on Parallel Processors, Proc. of Int l Conf. on Parallel Processing, vol. II, Aug. 1996, pp [16] C. McCreary and H. Gill, Automatic Determination of Grain Size for Efficient parallel Processing, Comm. ACM, vol. 32, Sept. 1989, pp. 1,073-1,078. [17] C. H. Papadimitriou and M. Yannakakis, Towards an architecture-independent analysis of parallel algorithms, ACM Proc. of Symp. on Theory of Computing (STOC), 1988, pp [18] G.-L. Park, B. Shirazi, and J. Marquis, Employing Task Duplication for Multiprocessor Scheduling, Tech. Report, Dept. of Computer Science and Engineering, University of Texas, [19] G.-L. Park, B. Shirazi, J. Marquis, and Hyunseung Choo, Decisive Path Scheduling: A New List Scheduling Method, Proceeding of 26 th International Conference on Parallel Processing, Chicago, USA, Aug. 1997, pp [20] V. Sarkar, Partitioning and scheduling Parallel Programs for Multiprocessors, Cambridge, Mass: MIT Press, [21] B. Shirazi, M. Wang, and G. Pathak, Analysis and Evaluation of Heuristic Methods for Static Task Scheduling, Journal of Parallel and Distributed Computing, vol. 10, No. 3, 1990, pp [22] B. Shirazi, A. R. Hurson, "Scheduling and Load Balancing: Guest Editors' Introduction," Journal of Parallel and Distributed Computing, Dec. 1992, pp [23] B. Shirazi, A. R. Hurson, "A Mini-track on Scheduling and Load Balancing: Track Coordinator's Introduction," Hawaii Int'l Conf. on System Sciences (HICSS-26), Jan. 1993, pp [24] B. Shirazi, K. Kavi,, A. R. Hurson, and P. Biswas, PARSA: A PARallel program Scheduling and Assesment environment, Proc. of Int l Conf. on Parallel Processing, vol. II, Aug. 1993, pp [25] B. Shirazi, H. B. Chen, K. Kavi, J. Marquis, and A. R. Hurson, PARSA: A Parallel Programs Software Development Tool, Proc. Symp. on Assessment of Auality Software Development Tools, June 1994, pp [26] B. Shirazi, A. R. Hurson, K. Kavi, "Scheduling & Load Balancing," IEEE Press, [27] B. Shirazi, H.-B. Chen, and J. Marquis, Comparative Study of Task Duplication Static Scheduling versus Clustering and Non-Clustering Techniques, Concurrency: Practice and Experience, vol. 7(5), Aug. 1995, pp [28] T. Yang and A. Gerasoulis, DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors, IEEE Trans. On Parallel and Distributed Systems, vol. 5, no. 9, Sept. 1994, pp [29] T. Yang and A. Gerasoulis, A dedicated track on Partitioning and Scheduling for Parallel and Distributed Computation, in the Hawaii Int l Conference on Systems Sciences, Jan [30] M. Y. Wu and D. D. Gajski, Hypertool: A Programming Aid for Message-Passing Systems, IEEE Trans. on Parallel and Distributed Systems, vol. 1, no. 3, Jul. 1990, pp [31] M.Y. Wu, A dedicated track on Program Partitioning and Scheduling in Parallel and Distributed Systems, in the Hawaii Int l Conference on Systems Sciences, Jan

Exploiting Duplication to Minimize the Execution Times of Parallel Programs on Message-Passing Systems

Exploiting Duplication to Minimize the Execution Times of Parallel Programs on Message-Passing Systems Exploiting Duplication to Minimize the Execution Times of Parallel Programs on Message-Passing Systems Yu-Kwong Kwok and Ishfaq Ahmad Department of Computer Science Hong Kong University of Science and

More information

A Comparison of Task-Duplication-Based Algorithms for Scheduling Parallel Programs to Message-Passing Systems

A Comparison of Task-Duplication-Based Algorithms for Scheduling Parallel Programs to Message-Passing Systems A Comparison of Task-Duplication-Based s for Scheduling Parallel Programs to Message-Passing Systems Ishfaq Ahmad and Yu-Kwong Kwok Department of Computer Science The Hong Kong University of Science and

More information

Benchmarking and Comparison of the Task Graph Scheduling Algorithms

Benchmarking and Comparison of the Task Graph Scheduling Algorithms Benchmarking and Comparison of the Task Graph Scheduling Algorithms Yu-Kwong Kwok 1 and Ishfaq Ahmad 2 1 Department of Electrical and Electronic Engineering The University of Hong Kong, Pokfulam Road,

More information

Contention-Aware Scheduling with Task Duplication

Contention-Aware Scheduling with Task Duplication Contention-Aware Scheduling with Task Duplication Oliver Sinnen, Andrea To, Manpreet Kaur Department of Electrical and Computer Engineering, University of Auckland Private Bag 92019, Auckland 1142, New

More information

A Level-wise Priority Based Task Scheduling for Heterogeneous Systems

A Level-wise Priority Based Task Scheduling for Heterogeneous Systems International Journal of Information and Education Technology, Vol., No. 5, December A Level-wise Priority Based Task Scheduling for Heterogeneous Systems R. Eswari and S. Nickolas, Member IACSIT Abstract

More information

Multiprocessor Scheduling Using Task Duplication Based Scheduling Algorithms: A Review Paper

Multiprocessor Scheduling Using Task Duplication Based Scheduling Algorithms: A Review Paper Multiprocessor Scheduling Using Task Duplication Based Scheduling Algorithms: A Review Paper Ravneet Kaur 1, Ramneek Kaur 2 Department of Computer Science Guru Nanak Dev University, Amritsar, Punjab, 143001,

More information

Static Multiprocessor Scheduling of Periodic Real-Time Tasks with Precedence Constraints and Communication Costs

Static Multiprocessor Scheduling of Periodic Real-Time Tasks with Precedence Constraints and Communication Costs Static Multiprocessor Scheduling of Periodic Real-Time Tasks with Precedence Constraints and Communication Costs Stefan Riinngren and Behrooz A. Shirazi Department of Computer Science and Engineering The

More information

Critical Path Scheduling Parallel Programs on an Unbounded Number of Processors

Critical Path Scheduling Parallel Programs on an Unbounded Number of Processors Critical Path Scheduling Parallel Programs on an Unbounded Number of Processors Mourad Hakem, Franck Butelle To cite this version: Mourad Hakem, Franck Butelle. Critical Path Scheduling Parallel Programs

More information

On the Complexity of List Scheduling Algorithms for Distributed-Memory Systems.

On the Complexity of List Scheduling Algorithms for Distributed-Memory Systems. On the Complexity of List Scheduling Algorithms for Distributed-Memory Systems. Andrei Rădulescu Arjan J.C. van Gemund Faculty of Information Technology and Systems Delft University of Technology P.O.Box

More information

A Novel Task Scheduling Algorithm for Heterogeneous Computing

A Novel Task Scheduling Algorithm for Heterogeneous Computing A Novel Task Scheduling Algorithm for Heterogeneous Computing Vinay Kumar C. P.Katti P. C. Saxena SC&SS SC&SS SC&SS Jawaharlal Nehru University Jawaharlal Nehru University Jawaharlal Nehru University New

More information

A Parallel Algorithm for Compile-Time Scheduling of Parallel Programs on Multiprocessors

A Parallel Algorithm for Compile-Time Scheduling of Parallel Programs on Multiprocessors A Parallel for Compile-Time Scheduling of Parallel Programs on Multiprocessors Yu-Kwong Kwok and Ishfaq Ahmad Email: {csricky, iahmad}@cs.ust.hk Department of Computer Science The Hong Kong University

More information

A Fast Recursive Mapping Algorithm. Department of Computer and Information Science. New Jersey Institute of Technology.

A Fast Recursive Mapping Algorithm. Department of Computer and Information Science. New Jersey Institute of Technology. A Fast Recursive Mapping Algorithm Song Chen and Mary M. Eshaghian Department of Computer and Information Science New Jersey Institute of Technology Newark, NJ 7 Abstract This paper presents a generic

More information

Controlled duplication for scheduling real-time precedence tasks on heterogeneous multiprocessors

Controlled duplication for scheduling real-time precedence tasks on heterogeneous multiprocessors Controlled duplication for scheduling real-time precedence tasks on heterogeneous multiprocessors Jagpreet Singh* and Nitin Auluck Department of Computer Science & Engineering Indian Institute of Technology,

More information

Parallel Program Execution on a Heterogeneous PC Cluster Using Task Duplication

Parallel Program Execution on a Heterogeneous PC Cluster Using Task Duplication Parallel Program Execution on a Heterogeneous PC Cluster Using Task Duplication YU-KWONG KWOK Department of Electrical and Electronic Engineering The University of Hong Kong, Pokfulam Road, Hong Kong Email:

More information

Scheduling Directed A-cyclic Task Graphs on a Bounded Set of Heterogeneous Processors Using Task Duplication

Scheduling Directed A-cyclic Task Graphs on a Bounded Set of Heterogeneous Processors Using Task Duplication Scheduling Directed A-cyclic Task Graphs on a Bounded Set of Heterogeneous Processors Using Task Duplication Sanjeev Baskiyar, Ph.D. & Christopher Dickinson Department of Computer Science and Software

More information

Heuristic Algorithms for Multiconstrained Quality-of-Service Routing

Heuristic Algorithms for Multiconstrained Quality-of-Service Routing 244 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL 10, NO 2, APRIL 2002 Heuristic Algorithms for Multiconstrained Quality-of-Service Routing Xin Yuan, Member, IEEE Abstract Multiconstrained quality-of-service

More information

DSC: Scheduling Parallel Tasks on an Unbounded Number of. Processors 3. Tao Yang and Apostolos Gerasoulis. Department of Computer Science

DSC: Scheduling Parallel Tasks on an Unbounded Number of. Processors 3. Tao Yang and Apostolos Gerasoulis. Department of Computer Science DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors 3 Tao Yang and Apostolos Gerasoulis Department of Computer Science Rutgers University New Brunswick, NJ 08903 Email: ftyang, gerasoulisg@cs.rutgers.edu

More information

On Checkpoint Latency. Nitin H. Vaidya. In the past, a large number of researchers have analyzed. the checkpointing and rollback recovery scheme

On Checkpoint Latency. Nitin H. Vaidya. In the past, a large number of researchers have analyzed. the checkpointing and rollback recovery scheme On Checkpoint Latency Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-3112 E-mail: vaidya@cs.tamu.edu Web: http://www.cs.tamu.edu/faculty/vaidya/ Abstract

More information

Efficient Algorithms for Scheduling and Mapping of Parallel Programs onto Parallel Architectures

Efficient Algorithms for Scheduling and Mapping of Parallel Programs onto Parallel Architectures Efficient Algorithms for Scheduling and Mapping of Parallel Programs onto Parallel Architectures By Yu-Kwong KWOK A Thesis Presented to The Hong Kong University of Science and Technology in Partial Fulfilment

More information

A Duplication Based List Scheduling Genetic Algorithm for Scheduling Task on Parallel Processors

A Duplication Based List Scheduling Genetic Algorithm for Scheduling Task on Parallel Processors A Duplication Based List Scheduling Genetic Algorithm for Scheduling Task on Parallel Processors Dr. Gurvinder Singh Department of Computer Science & Engineering, Guru Nanak Dev University, Amritsar- 143001,

More information

Provably Efficient Non-Preemptive Task Scheduling with Cilk

Provably Efficient Non-Preemptive Task Scheduling with Cilk Provably Efficient Non-Preemptive Task Scheduling with Cilk V. -Y. Vee and W.-J. Hsu School of Applied Science, Nanyang Technological University Nanyang Avenue, Singapore 639798. Abstract We consider the

More information

Data Flow Graph Partitioning Schemes

Data Flow Graph Partitioning Schemes Data Flow Graph Partitioning Schemes Avanti Nadgir and Harshal Haridas Department of Computer Science and Engineering, The Pennsylvania State University, University Park, Pennsylvania 16802 Abstract: The

More information

Introduction Traditionally, the parallelization of deterministic mesh sweeps that arise in particle-transport computations has been viewed as an appli

Introduction Traditionally, the parallelization of deterministic mesh sweeps that arise in particle-transport computations has been viewed as an appli Task Scheduling and Parallel Mesh-Sweeps in Transport Computations Nancy M. Amato amato@cs.tamu.edu Ping An pinga@cs.tamu.edu Technical Report 00-009 Department of Computer Science Texas A&M University

More information

Optimal Scheduling for UET-UCT Generalized n-dimensional Grid Task Graphs =

Optimal Scheduling for UET-UCT Generalized n-dimensional Grid Task Graphs = Optimal Scheduling for UET-UCT Generalized n-dimensional Grid Task Graphs = Theodore Andronikos, Nectarios Koziris, George Papakonstantinou and Panayiotis Tsanakas National Technical University of Athens

More information

A SIMULATION OF POWER-AWARE SCHEDULING OF TASK GRAPHS TO MULTIPLE PROCESSORS

A SIMULATION OF POWER-AWARE SCHEDULING OF TASK GRAPHS TO MULTIPLE PROCESSORS A SIMULATION OF POWER-AWARE SCHEDULING OF TASK GRAPHS TO MULTIPLE PROCESSORS Xiaojun Qi, Carson Jones, and Scott Cannon Computer Science Department Utah State University, Logan, UT, USA 84322-4205 xqi@cc.usu.edu,

More information

/$ IEEE

/$ IEEE IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 56, NO. 1, JANUARY 2009 81 Bit-Level Extrinsic Information Exchange Method for Double-Binary Turbo Codes Ji-Hoon Kim, Student Member,

More information

ISHFAQ AHMAD 1 AND YU-KWONG KWOK 2

ISHFAQ AHMAD 1 AND YU-KWONG KWOK 2 Optimal and Near-Optimal Allocation of Precedence-Constrained Tasks to Parallel Processors: Defying the High Complexity Using Effective Search Techniques Abstract Obtaining an optimal schedule for a set

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

Static Scheduling Algorithms for Allocating Directed Task Graphs to Multiprocessors

Static Scheduling Algorithms for Allocating Directed Task Graphs to Multiprocessors Static Scheduling Algorithms for Allocating Directed Task Graphs to Multiprocessors YU-KWONG KWOK The University of Hong Kong AND ISHFAQ AHMAD The Hong Kong University of Science and Technology Static

More information

An Experimental Investigation into the Rank Function of the Heterogeneous Earliest Finish Time Scheduling Algorithm

An Experimental Investigation into the Rank Function of the Heterogeneous Earliest Finish Time Scheduling Algorithm An Experimental Investigation into the Rank Function of the Heterogeneous Earliest Finish Time Scheduling Algorithm Henan Zhao and Rizos Sakellariou Department of Computer Science, University of Manchester,

More information

A Modified Genetic Algorithm for Task Scheduling in Multiprocessor Systems

A Modified Genetic Algorithm for Task Scheduling in Multiprocessor Systems A Modified Genetic Algorithm for Task Scheduling in Multiprocessor Systems Yi-Hsuan Lee and Cheng Chen Department of Computer Science and Information Engineering National Chiao Tung University, Hsinchu,

More information

Scheduling directed a-cyclic task graphs on a bounded set of heterogeneous processors using task duplication

Scheduling directed a-cyclic task graphs on a bounded set of heterogeneous processors using task duplication J. Parallel Distrib. Comput. 65 (5) 911 91 www.elsevier.com/locate/jpdc Scheduling directed a-cyclic task graphs on a bounded set of heterogeneous processors using task duplication Sanjeev Baskiyar, Christopher

More information

A resource-aware scheduling algorithm with reduced task duplication on heterogeneous computing systems

A resource-aware scheduling algorithm with reduced task duplication on heterogeneous computing systems J Supercomput (2014) 68:1347 1377 DOI 10.1007/s11227-014-1090-4 A resource-aware scheduling algorithm with reduced task duplication on heterogeneous computing systems Jing Mei Kenli Li Keqin Li Published

More information

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 The Encoding Complexity of Network Coding Michael Langberg, Member, IEEE, Alexander Sprintson, Member, IEEE, and Jehoshua Bruck,

More information

A Comparison of Heuristics for Scheduling DAGs on Multiprocessors

A Comparison of Heuristics for Scheduling DAGs on Multiprocessors A Comparison of Heuristics for Scheduling DAGs on Multiprocessors C.L. McCreary, A.A. Khan, J. Thompson, M.E. McArdle, Department of Computer Science and Engineering Auburn University, AL 689 mccreary@eng.auburn.edu

More information

Reliability and Scheduling on Systems Subject to Failures

Reliability and Scheduling on Systems Subject to Failures Reliability and Scheduling on Systems Subject to Failures Mourad Hakem and Franck Butelle LIPN CNRS UMR 7030 Université Paris Nord Av. J.B. Clément 93430 Villetaneuse France {Mourad.Hakem,Franck.Butelle}@lipn.univ-paris3.fr

More information

Scheduling on clusters and grids

Scheduling on clusters and grids Some basics on scheduling theory Grégory Mounié, Yves Robert et Denis Trystram ID-IMAG 6 mars 2006 Some basics on scheduling theory 1 Some basics on scheduling theory Notations and Definitions List scheduling

More information

On The Complexity of Virtual Topology Design for Multicasting in WDM Trees with Tap-and-Continue and Multicast-Capable Switches

On The Complexity of Virtual Topology Design for Multicasting in WDM Trees with Tap-and-Continue and Multicast-Capable Switches On The Complexity of Virtual Topology Design for Multicasting in WDM Trees with Tap-and-Continue and Multicast-Capable Switches E. Miller R. Libeskind-Hadas D. Barnard W. Chang K. Dresner W. M. Turner

More information

Parallel Code Generation in MathModelica / An Object Oriented Component Based Simulation Environment

Parallel Code Generation in MathModelica / An Object Oriented Component Based Simulation Environment Parallel Code Generation in MathModelica / An Object Oriented Component Based Simulation Environment Peter Aronsson, Peter Fritzson (petar,petfr)@ida.liu.se Dept. of Computer and Information Science, Linköping

More information

ARITHMETIC operations based on residue number systems

ARITHMETIC operations based on residue number systems IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 53, NO. 2, FEBRUARY 2006 133 Improved Memoryless RNS Forward Converter Based on the Periodicity of Residues A. B. Premkumar, Senior Member,

More information

Consistent Logical Checkpointing. Nitin H. Vaidya. Texas A&M University. Phone: Fax:

Consistent Logical Checkpointing. Nitin H. Vaidya. Texas A&M University. Phone: Fax: Consistent Logical Checkpointing Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-3112 hone: 409-845-0512 Fax: 409-847-8578 E-mail: vaidya@cs.tamu.edu Technical

More information

Real-Time Scalability of Nested Spin Locks. Hiroaki Takada and Ken Sakamura. Faculty of Science, University of Tokyo

Real-Time Scalability of Nested Spin Locks. Hiroaki Takada and Ken Sakamura. Faculty of Science, University of Tokyo Real-Time Scalability of Nested Spin Locks Hiroaki Takada and Ken Sakamura Department of Information Science, Faculty of Science, University of Tokyo 7-3-1, Hongo, Bunkyo-ku, Tokyo 113, Japan Abstract

More information

Dr. Amotz Bar-Noy s Compendium of Algorithms Problems. Problems, Hints, and Solutions

Dr. Amotz Bar-Noy s Compendium of Algorithms Problems. Problems, Hints, and Solutions Dr. Amotz Bar-Noy s Compendium of Algorithms Problems Problems, Hints, and Solutions Chapter 1 Searching and Sorting Problems 1 1.1 Array with One Missing 1.1.1 Problem Let A = A[1],..., A[n] be an array

More information

LIST BASED SCHEDULING ALGORITHM FOR HETEROGENEOUS SYSYTEM

LIST BASED SCHEDULING ALGORITHM FOR HETEROGENEOUS SYSYTEM LIST BASED SCHEDULING ALGORITHM FOR HETEROGENEOUS SYSYTEM C. Subramanian 1, N.Rajkumar 2, S. Karthikeyan 3, Vinothkumar 4 1 Assoc.Professor, Department of Computer Applications, Dr. MGR Educational and

More information

Thwarting Traceback Attack on Freenet

Thwarting Traceback Attack on Freenet Thwarting Traceback Attack on Freenet Guanyu Tian, Zhenhai Duan Florida State University {tian, duan}@cs.fsu.edu Todd Baumeister, Yingfei Dong University of Hawaii {baumeist, yingfei}@hawaii.edu Abstract

More information

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

Architecture-Dependent Tuning of the Parameterized Communication Model for Optimal Multicasting

Architecture-Dependent Tuning of the Parameterized Communication Model for Optimal Multicasting Architecture-Dependent Tuning of the Parameterized Communication Model for Optimal Multicasting Natawut Nupairoj and Lionel M. Ni Department of Computer Science Michigan State University East Lansing,

More information

A Task Scheduling Method for Data Intensive Jobs in Multicore Distributed System

A Task Scheduling Method for Data Intensive Jobs in Multicore Distributed System 第一工業大学研究報告第 27 号 (2015)pp.13-17 13 A Task Scheduling Method for Data Intensive Jobs in Multicore Distributed System Kazuo Hajikano* 1 Hidehiro Kanemitsu* 2 Moo Wan Kim* 3 *1 Department of Information Technology

More information

An Extension of Edge Zeroing Heuristic for Scheduling Precedence Constrained Task Graphs on Parallel Systems Using Cluster Dependent Priority Scheme

An Extension of Edge Zeroing Heuristic for Scheduling Precedence Constrained Task Graphs on Parallel Systems Using Cluster Dependent Priority Scheme ISSN 1746-7659, England, UK Journal of Information and Computing Science Vol. 6, No. 2, 2011, pp. 083-096 An Extension of Edge Zeroing Heuristic for Scheduling Precedence Constrained Task Graphs on Parallel

More information

HEURISTIC BASED TASK SCHEDULING IN MULTIPROCESSOR SYSTEMS WITH GENETIC ALGORITHM BY CHOOSING THE ELIGIBLE PROCESSOR

HEURISTIC BASED TASK SCHEDULING IN MULTIPROCESSOR SYSTEMS WITH GENETIC ALGORITHM BY CHOOSING THE ELIGIBLE PROCESSOR HEURISTIC BASED TASK SCHEDULING IN MULTIPROCESSOR SYSTEMS WITH GENETIC ALGORITHM BY CHOOSING THE ELIGIBLE PROCESSOR Probir Roy 1, Md. Mejbah Ul Alam 1 and Nishita Das 2 1 Bangladesh University of Engineering

More information

Generic Methodologies for Deadlock-Free Routing

Generic Methodologies for Deadlock-Free Routing Generic Methodologies for Deadlock-Free Routing Hyunmin Park Dharma P. Agrawal Department of Computer Engineering Electrical & Computer Engineering, Box 7911 Myongji University North Carolina State University

More information

Simulation of Petri Nets in Rule-Based Expert System Shell McESE

Simulation of Petri Nets in Rule-Based Expert System Shell McESE Abstract Simulation of Petri Nets in Rule-Based Expert System Shell McESE F. Franek and I. Bruha Dept of Computer Science and Systems, McMaster University Hamilton, Ont., Canada, L8S4K1 Email: {franya

More information

Performance Improvement of Hardware-Based Packet Classification Algorithm

Performance Improvement of Hardware-Based Packet Classification Algorithm Performance Improvement of Hardware-Based Packet Classification Algorithm Yaw-Chung Chen 1, Pi-Chung Wang 2, Chun-Liang Lee 2, and Chia-Tai Chan 2 1 Department of Computer Science and Information Engineering,

More information

Near Optimal Repair Rate Built-in Redundancy Analysis with Very Small Hardware Overhead

Near Optimal Repair Rate Built-in Redundancy Analysis with Very Small Hardware Overhead Near Optimal Repair Rate Built-in Redundancy Analysis with Very Small Hardware Overhead Woosung Lee, Keewon Cho, Jooyoung Kim, and Sungho Kang Department of Electrical & Electronic Engineering, Yonsei

More information

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Seminar on A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Mohammad Iftakher Uddin & Mohammad Mahfuzur Rahman Matrikel Nr: 9003357 Matrikel Nr : 9003358 Masters of

More information

On the Space-Time Trade-off in Solving Constraint Satisfaction Problems*

On the Space-Time Trade-off in Solving Constraint Satisfaction Problems* Appeared in Proc of the 14th Int l Joint Conf on Artificial Intelligence, 558-56, 1995 On the Space-Time Trade-off in Solving Constraint Satisfaction Problems* Roberto J Bayardo Jr and Daniel P Miranker

More information

THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER

THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER Akhil Kumar and Michael Stonebraker EECS Department University of California Berkeley, Ca., 94720 Abstract A heuristic query optimizer must choose

More information

A Connection between Network Coding and. Convolutional Codes

A Connection between Network Coding and. Convolutional Codes A Connection between Network Coding and 1 Convolutional Codes Christina Fragouli, Emina Soljanin christina.fragouli@epfl.ch, emina@lucent.com Abstract The min-cut, max-flow theorem states that a source

More information

A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis

A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis Bruno da Silva, Jan Lemeire, An Braeken, and Abdellah Touhafi Vrije Universiteit Brussel (VUB), INDI and ETRO department, Brussels,

More information

Byzantine Consensus in Directed Graphs

Byzantine Consensus in Directed Graphs Byzantine Consensus in Directed Graphs Lewis Tseng 1,3, and Nitin Vaidya 2,3 1 Department of Computer Science, 2 Department of Electrical and Computer Engineering, and 3 Coordinated Science Laboratory

More information

arxiv: v2 [cs.ds] 22 Jun 2016

arxiv: v2 [cs.ds] 22 Jun 2016 Federated Scheduling Admits No Constant Speedup Factors for Constrained-Deadline DAG Task Systems Jian-Jia Chen Department of Informatics, TU Dortmund University, Germany arxiv:1510.07254v2 [cs.ds] 22

More information

A Localized Algorithm for Reducing the Size of Dominating Set in Mobile Ad Hoc Networks

A Localized Algorithm for Reducing the Size of Dominating Set in Mobile Ad Hoc Networks A Localized Algorithm for Reducing the Size of Dominating Set in Mobile Ad Hoc Networks Yamin Li and Shietung Peng Department of Computer Science Hosei University Tokyo 18-858 Japan {yamin, speng}@k.hosei.ac.jp

More information

Redundancy Resolution by Minimization of Joint Disturbance Torque for Independent Joint Controlled Kinematically Redundant Manipulators

Redundancy Resolution by Minimization of Joint Disturbance Torque for Independent Joint Controlled Kinematically Redundant Manipulators 56 ICASE :The Institute ofcontrol,automation and Systems Engineering,KOREA Vol.,No.1,March,000 Redundancy Resolution by Minimization of Joint Disturbance Torque for Independent Joint Controlled Kinematically

More information

Achieving Distributed Buffering in Multi-path Routing using Fair Allocation

Achieving Distributed Buffering in Multi-path Routing using Fair Allocation Achieving Distributed Buffering in Multi-path Routing using Fair Allocation Ali Al-Dhaher, Tricha Anjali Department of Electrical and Computer Engineering Illinois Institute of Technology Chicago, Illinois

More information

Energy-Constrained Scheduling of DAGs on Multi-core Processors

Energy-Constrained Scheduling of DAGs on Multi-core Processors Energy-Constrained Scheduling of DAGs on Multi-core Processors Ishfaq Ahmad 1, Roman Arora 1, Derek White 1, Vangelis Metsis 1, and Rebecca Ingram 2 1 University of Texas at Arlington, Computer Science

More information

Programmers of parallel computers regard automated parallelprogramming

Programmers of parallel computers regard automated parallelprogramming Automatic Parallelization Ishfaq Ahmad Hong Kong University of Science and Technology Yu-Kwong Kwok University of Hong Kong Min-You Wu, and Wei Shu University of Central Florida The authors explain the

More information

A Framework for Space and Time Efficient Scheduling of Parallelism

A Framework for Space and Time Efficient Scheduling of Parallelism A Framework for Space and Time Efficient Scheduling of Parallelism Girija J. Narlikar Guy E. Blelloch December 996 CMU-CS-96-97 School of Computer Science Carnegie Mellon University Pittsburgh, PA 523

More information

Lecture 9: Load Balancing & Resource Allocation

Lecture 9: Load Balancing & Resource Allocation Lecture 9: Load Balancing & Resource Allocation Introduction Moler s law, Sullivan s theorem give upper bounds on the speed-up that can be achieved using multiple processors. But to get these need to efficiently

More information

Layer-Based Scheduling Algorithms for Multiprocessor-Tasks with Precedence Constraints

Layer-Based Scheduling Algorithms for Multiprocessor-Tasks with Precedence Constraints Layer-Based Scheduling Algorithms for Multiprocessor-Tasks with Precedence Constraints Jörg Dümmler, Raphael Kunis, and Gudula Rünger Chemnitz University of Technology, Department of Computer Science,

More information

A Modular Genetic Algorithm for Scheduling Task Graphs

A Modular Genetic Algorithm for Scheduling Task Graphs A Modular Genetic Algorithm for Scheduling Task Graphs Michael Rinehart, Vida Kianzad, and Shuvra S. Bhattacharyya Department of Electrical and Computer Engineering, and Institute for Advanced Computer

More information

An Effective Load Balancing Task Allocation Algorithm using Task Clustering

An Effective Load Balancing Task Allocation Algorithm using Task Clustering An Effective Load Balancing Task Allocation using Task Clustering Poornima Bhardwaj Research Scholar, Department of Computer Science Gurukul Kangri Vishwavidyalaya,Haridwar, India Vinod Kumar, Ph.d Professor

More information

On the Hardness of Counting the Solutions of SPARQL Queries

On the Hardness of Counting the Solutions of SPARQL Queries On the Hardness of Counting the Solutions of SPARQL Queries Reinhard Pichler and Sebastian Skritek Vienna University of Technology, Faculty of Informatics {pichler,skritek}@dbai.tuwien.ac.at 1 Introduction

More information

3 No-Wait Job Shops with Variable Processing Times

3 No-Wait Job Shops with Variable Processing Times 3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select

More information

CASCH: A Software Tool for Automatic Parallelization and Scheduling of Programs on Message-Passing Multiprocessors

CASCH: A Software Tool for Automatic Parallelization and Scheduling of Programs on Message-Passing Multiprocessors CASCH: A Software Tool for Automatic Parallelization and Scheduling of Programs on Message-Passing Multiprocessors Abstract Ishfaq Ahmad 1, Yu-Kwong Kwok 2, Min-You Wu 3, and Wei Shu 3 1 Department of

More information

APPROXIMATING A PARALLEL TASK SCHEDULE USING LONGEST PATH

APPROXIMATING A PARALLEL TASK SCHEDULE USING LONGEST PATH APPROXIMATING A PARALLEL TASK SCHEDULE USING LONGEST PATH Daniel Wespetal Computer Science Department University of Minnesota-Morris wesp0006@mrs.umn.edu Joel Nelson Computer Science Department University

More information

Quantized Iterative Message Passing Decoders with Low Error Floor for LDPC Codes

Quantized Iterative Message Passing Decoders with Low Error Floor for LDPC Codes Quantized Iterative Message Passing Decoders with Low Error Floor for LDPC Codes Xiaojie Zhang and Paul H. Siegel University of California, San Diego 1. Introduction Low-density parity-check (LDPC) codes

More information

is the Capacitated Minimum Spanning Tree

is the Capacitated Minimum Spanning Tree Dynamic Capacitated Minimum Spanning Trees Raja Jothi and Balaji Raghavachari Department of Computer Science, University of Texas at Dallas Richardson, TX 75083, USA raja, rbk @utdallas.edu Abstract Given

More information

URSA: A Unified ReSource Allocator for Registers and Functional Units in VLIW Architectures

URSA: A Unified ReSource Allocator for Registers and Functional Units in VLIW Architectures Presented at IFIP WG 10.3(Concurrent Systems) Working Conference on Architectures and Compliation Techniques for Fine and Medium Grain Parallelism, Orlando, Fl., January 1993 URSA: A Unified ReSource Allocator

More information

How Much Logic Should Go in an FPGA Logic Block?

How Much Logic Should Go in an FPGA Logic Block? How Much Logic Should Go in an FPGA Logic Block? Vaughn Betz and Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto Toronto, Ontario, Canada M5S 3G4 {vaughn, jayar}@eecgutorontoca

More information

Embedding Large Complete Binary Trees in Hypercubes with Load Balancing

Embedding Large Complete Binary Trees in Hypercubes with Load Balancing JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING 35, 104 109 (1996) ARTICLE NO. 0073 Embedding Large Complete Binary Trees in Hypercubes with Load Balancing KEMAL EFE Center for Advanced Computer Studies,

More information

Foundations of Discrete Mathematics

Foundations of Discrete Mathematics Foundations of Discrete Mathematics Chapter 12 By Dr. Dalia M. Gil, Ph.D. Trees Tree are useful in computer science, where they are employed in a wide range of algorithms. They are used to construct efficient

More information

QoS-Aware Hierarchical Multicast Routing on Next Generation Internetworks

QoS-Aware Hierarchical Multicast Routing on Next Generation Internetworks QoS-Aware Hierarchical Multicast Routing on Next Generation Internetworks Satyabrata Pradhan, Yi Li, and Muthucumaru Maheswaran Advanced Networking Research Laboratory Department of Computer Science University

More information

Subset sum problem and dynamic programming

Subset sum problem and dynamic programming Lecture Notes: Dynamic programming We will discuss the subset sum problem (introduced last time), and introduce the main idea of dynamic programming. We illustrate it further using a variant of the so-called

More information

Fault-Tolerant Wormhole Routing Algorithms in Meshes in the Presence of Concave Faults

Fault-Tolerant Wormhole Routing Algorithms in Meshes in the Presence of Concave Faults Fault-Tolerant Wormhole Routing Algorithms in Meshes in the Presence of Concave Faults Seungjin Park Jong-Hoon Youn Bella Bose Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science

More information

Multithreaded Algorithms Part 1. Dept. of Computer Science & Eng University of Moratuwa

Multithreaded Algorithms Part 1. Dept. of Computer Science & Eng University of Moratuwa CS4460 Advanced d Algorithms Batch 08, L4S2 Lecture 11 Multithreaded Algorithms Part 1 N. H. N. D. de Silva Dept. of Computer Science & Eng University of Moratuwa Announcements Last topic discussed is

More information

An Efficient Method for Constructing a Distributed Depth-First Search Tree

An Efficient Method for Constructing a Distributed Depth-First Search Tree An Efficient Method for Constructing a Distributed Depth-First Search Tree S. A. M. Makki and George Havas School of Information Technology The University of Queensland Queensland 4072 Australia sam@it.uq.oz.au

More information

Hierarchical Intelligent Cuttings: A Dynamic Multi-dimensional Packet Classification Algorithm

Hierarchical Intelligent Cuttings: A Dynamic Multi-dimensional Packet Classification Algorithm 161 CHAPTER 5 Hierarchical Intelligent Cuttings: A Dynamic Multi-dimensional Packet Classification Algorithm 1 Introduction We saw in the previous chapter that real-life classifiers exhibit structure and

More information

Distributed Clustering Method for Large-Scaled Wavelength Routed Networks

Distributed Clustering Method for Large-Scaled Wavelength Routed Networks Distributed Clustering Method for Large-Scaled Wavelength Routed Networks Yukinobu Fukushima Graduate School of Information Science and Technology, Osaka University - Yamadaoka, Suita, Osaka 60-08, Japan

More information

An algorithm for Performance Analysis of Single-Source Acyclic graphs

An algorithm for Performance Analysis of Single-Source Acyclic graphs An algorithm for Performance Analysis of Single-Source Acyclic graphs Gabriele Mencagli September 26, 2011 In this document we face with the problem of exploiting the performance analysis of acyclic graphs

More information

Comprehensive Solution for Anomaly-free BGP

Comprehensive Solution for Anomaly-free BGP Comprehensive Solution for Anomaly-free BGP Ravi Musunuri, Jorge A. Cobb Department of Computer Science, The University of Texas at Dallas, Richardson, TX-7083-0688 musunuri, cobb @utdallas.edu Abstract.

More information

Efficiently Utilizing ATE Vector Repeat for Compression by Scan Vector Decomposition

Efficiently Utilizing ATE Vector Repeat for Compression by Scan Vector Decomposition Efficiently Utilizing ATE Vector Repeat for Compression by Scan Vector Decomposition Jinkyu Lee and Nur A. Touba Computer Engineering Research Center University of Teas, Austin, TX 7872 {jlee2, touba}@ece.uteas.edu

More information

An Empirical Performance Study of Connection Oriented Time Warp Parallel Simulation

An Empirical Performance Study of Connection Oriented Time Warp Parallel Simulation 230 The International Arab Journal of Information Technology, Vol. 6, No. 3, July 2009 An Empirical Performance Study of Connection Oriented Time Warp Parallel Simulation Ali Al-Humaimidi and Hussam Ramadan

More information

Power and Locality Aware Request Distribution Technical Report Heungki Lee, Gopinath Vageesan and Eun Jung Kim Texas A&M University College Station

Power and Locality Aware Request Distribution Technical Report Heungki Lee, Gopinath Vageesan and Eun Jung Kim Texas A&M University College Station Power and Locality Aware Request Distribution Technical Report Heungki Lee, Gopinath Vageesan and Eun Jung Kim Texas A&M University College Station Abstract With the growing use of cluster systems in file

More information

The Relationship between Slices and Module Cohesion

The Relationship between Slices and Module Cohesion The Relationship between Slices and Module Cohesion Linda M. Ott Jeffrey J. Thuss Department of Computer Science Michigan Technological University Houghton, MI 49931 Abstract High module cohesion is often

More information

Parallel Traveling Salesman. PhD Student: Viet Anh Trinh Advisor: Professor Feng Gu.

Parallel Traveling Salesman. PhD Student: Viet Anh Trinh Advisor: Professor Feng Gu. Parallel Traveling Salesman PhD Student: Viet Anh Trinh Advisor: Professor Feng Gu Agenda 1. Traveling salesman introduction 2. Genetic Algorithm for TSP 3. Tree Search for TSP Travelling Salesman - Set

More information

3 Competitive Dynamic BSTs (January 31 and February 2)

3 Competitive Dynamic BSTs (January 31 and February 2) 3 Competitive Dynamic BSTs (January 31 and February ) In their original paper on splay trees [3], Danny Sleator and Bob Tarjan conjectured that the cost of sequence of searches in a splay tree is within

More information

ADAPTIVE SORTING WITH AVL TREES

ADAPTIVE SORTING WITH AVL TREES ADAPTIVE SORTING WITH AVL TREES Amr Elmasry Computer Science Department Alexandria University Alexandria, Egypt elmasry@alexeng.edu.eg Abstract A new adaptive sorting algorithm is introduced. The new implementation

More information

A Preprocessor Approach to Persistent C++ Cem Evrendilek, Asuman Dogac and Tolga Gesli

A Preprocessor Approach to Persistent C++ Cem Evrendilek, Asuman Dogac and Tolga Gesli A Preprocessor Approach to Persistent C++ Cem Evrendilek, Asuman Dogac and Tolga Gesli Software Research and Development Center Scientific and Technical Research Council of Türkiye E-Mail:asuman@vm.cc.metu.edu.tr

More information

Optimizing Data Scheduling on Processor-In-Memory Arrays y

Optimizing Data Scheduling on Processor-In-Memory Arrays y Optimizing Data Scheduling on Processor-In-Memory Arrays y Yi Tian Edwin H.-M. Sha Chantana Chantrapornchai Peter M. Kogge Dept. of Computer Science and Engineering University of Notre Dame Notre Dame,

More information