On the Complexity of List Scheduling Algorithms for Distributed-Memory Systems.

Size: px
Start display at page:

Download "On the Complexity of List Scheduling Algorithms for Distributed-Memory Systems."

Transcription

1 On the Complexity of List Scheduling Algorithms for Distributed-Memory Systems. Andrei Rădulescu Arjan J.C. van Gemund Faculty of Information Technology and Systems Delft University of Technology P.O.Box 5031, 2600 GA Delft, The Netherlands ØÖ Ø This paper presents a novel heuristic, called Fast Critical Path (FCP), intended as a compile-time scheduling algorithm on a distributed-memory system. While similar to existing list scheduling algorithms, FCP has two important differences: (a) it does not sort all the tasks at the beginning but maintains only a limited number of tasks sorted at any given time, and (b) instead of considering all processors as possible target for a given task, the choice is restricted to either the processor from which the last messages to the given task arrives or the processor which becomes idle the earliest. As a result, the time complexity is drastically reduced to Ç Î ÐÓ È µ µ, where Î and are the number of tasks and edges in the task graph, respectively, and È is the number of processors. We demonstrate through theory and experiments that FCP performs equally to existing list scheduling algorithms of much higher complexity. ½ ÁÒØÖÓ ÙØ ÓÒ Efficient scheduling is essential to obtain high performance from a parallel program. If the structure of the program and the target machine are known in advance, scheduling can be done automatically at compile time, thus saving considerable overhead at run-time. Task scheduling on distributed memory systems is a tradeoff between exploiting as much parallelism as possible while at the same time reducing communication, to minimize the parallel completion time of the program. Except for very restricted cases, the scheduling problem has been shown to be NP-complete [2]. Therefore, for realistic cases, scheduling is performed using heuristics. Moreover, in order to be of practical use for large parallel This research is part of the Automap project granted by the Netherlands Computer Science Foundation (SION) with financial support from the Netherlands Organization for Scientific Research (NWO) under grant number SION-2519/ To appear in the 1999 ACM International Conference on Supercomputing, June 1999, Rhodes, Greece. applications, scheduling heuristics must have a low complexity. For shared-memory systems, it has been proven that even a low-cost scheduling heuristic is guaranteed to produce acceptable performance [3]. In the distributedmemory case however, such a guarantee does not exist and the scheduling problem remains a challenge, especially for algorithms where low cost is of principal interest. Scheduling can be done either for a bounded or an unbounded number of processors. Although attractive from both cost and performance perspective, scheduling for an unbounded number of processors is rarely of practical use, because the required number of processors is not usually available. Hence, their application is typically found within the multi-step scheduling method for a bounded number of processors [7, 8, 11]. Scheduling for a bounded number of processors can be done either using duplication (e.g., DSH [5] or CPFD [1]) or without duplication (e.g., MCP [10], ETF [4] or DSC-LLB [7, 11]). Duplicating tasks results in better performance but significantly increases cost compared to non-duplicating heuristics. Within non-duplicating heuristics, list scheduling algorithms obtain good performance at a low cost [6, 7]. However, when compiling large programs for large systems, the complexity of current list scheduling approaches is still prohibitive. This paper presents a new compile-time task scheduling algorithm, called Fast Critical Path (FCP). The main objective is to reduce the scheduling costs as much as possible, while maintaining performance comparable to existing list scheduling algorithms. The FCP algorithm is inspired by list scheduling algorithms which statically compute task priorities. The algorithms in this class already have a low complexity and good output performance. In our approach, we drastically reduce this complexity, yet maintaining equivalent performance. Moreover, FCP obtains better results compared to multi-step scheduling at an even lower complexity. This paper is organized as follows: The next section describes the scheduling problem and introduces some definitions used in the paper. Section 3 briefly reviews some of the well-known scheduling algorithms. Section 4 presents the FCP algorithm, while Section 5 describes its performance. Section 6 concludes the paper. 1

2 ¾ ÈÖ Ð Ñ Ò Ö A parallel program can be modeled by a directed acyclic graph Î µ, where Î is a set of Î nodes and is a set of edges. A node in the DAG represents a task, containing instructions that execute sequentially without preemption. Each task Ø has a weight ÓÑÔ Øµ associated with it, which represents the computation cost of executing the task. The edges correspond to task dependencies (communication messages or precedence constraints). Each edge Ø Ø ¼ µ has a weight ÓÑÑ Ø Ø ¼ µ associated with it, which represents the communication cost to satisfy the dependence. The communication to computation ratio ( Ê) of a parallel program is defined as the ratio between its average communication cost and its average computation cost. A task with no input edges is called an entry task, while a task with no output edges is called an exit task. A task is said to be ready if all its parents have finished their execution. A task can start its execution only after all its dependencies have been satisfied. If two tasks are mapped to the same processor, the communication cost between them is assumed to be zero. As a distributed system, we assume a set È of È processors connected in homogeneous clique topology. Interprocessor communication is performed without contention and tasks are executed without preemption. Once scheduled, a task Ø is associated with a processor È Øµ, a start time ËÌ Øµ and a finish time Ì Øµ. If the task is not scheduled, these three values are not defined. The processor idle time of a given processor Ô on a partial schedule is defined as the finish time of the last task scheduled on the given processor to È ÁÌ Ôµ Ñ Ü Ø¾Î È Øµ Ô Ì Øµ. The objective of the scheduling problem is to find a scheduling of the tasks in Î on the target system such that the parallel completion time (schedule length) is minimized. The parallel completion time is defined as Ì Ô Ö Ñ Ü Ô¾È È ÁÌ Ôµ. Ê Ð Ø ÏÓÖ In this section, four existing scheduling algorithms and their characteristics are described: (a) three list scheduling algorithms: Modified Critical Path (MCP) [10], Critical Path Method (CPM) [9], Earliest Task First (ETF) [4] and (b) a multi-step method (DSC-LLB) composed of: Dominant Sequence Clustering (DSC) [11] and List-Based Load Balancing (LLB) [7]. º½ Å È The MCP algorithm is a list scheduling algorithm in which task priorities are based on the latest possible start time of the tasks. The latest possible start time is computed as difference between the critical path of the graph and the longest path from the current task to any exit task. A path length is the sum of the execution times and the communication costs of the tasks and edges belonging to the path. The critical path is the longest path in a graph. A task with the smallest latest possible start time has the highest priority. Ties are broken considering the priorities of the task s descendents. The tasks are selected in the order of their priorities and assigned on the processor that can execute it the earliest. The time complexity of MCP is Ç Î ¾ ÐÓ Î µ È µµ. MCP is relatively fast compared with other list scheduling algorithms. Furthermore, its scheduling performance is shown to be superior compared to most of the other algorithms for bounded number of processors [6, 7]. MCP can be modified to run faster by choosing a random tie breaking scheme at a negligible loss of performance. In this case, the time complexity is reduced to Ç Î ÐÓ Î µ Î µè µ. º¾ ÈÅ Like MCP, the CPM algorithm also uses the latest possible start time as the task static priorities. However, the processor selection is different. The task with the highest priority is scheduled on the processor that becomes idle the earliest. The time complexity of CPM is Ç Î ÐÓ Î µ ÐÓ È µµµ. CPM was originally designed without taking communication into consideration. In the case of non-zero communication delays, the scheduling performance of CPM decreases significantly. The reason is that CPM does not try to reduce communication costs, but only balances the load of the processors. Therefore, CPM is closer to a load balancing scheme. º Ì The ETF algorithm is a list scheduling algorithm based on a dynamic task priority scheme. At each scheduling step, the priorities for ready unmapped tasks are computed. The task priority is the earliest start time, which is determined by tentatively mapping the given tasks to all processors. The task with the minimum priority is selected and mapped to the processor corresponding to this priority. Ties are broken by considering the statically computed priority. The time complexity of ETF is Ç Î È µ. ETF has a higher complexity than MCP because at each step, it is required to compute task priorities. Yet, the performance does not seem to improve [6]. The main idea is to keep the processor busy, in this respect being close to a load balancing scheme. Because of that, the ETF algorithm does not always map the most important ready tasks first (i.e., the tasks on the critical path). º Ë ¹ÄÄ DSC-LLB is a multi-step scheduling algorithm. The first step, applying DSC, is intended to minimize communication by grouping the highly communicating tasks together in clusters. The second step, using LLB, maps the clusters to 2

3 the existing processors and orders the tasks within the clusters. In the DSC algorithm, task priorities are dynamically computed as the sum of their top level and bottom level. The top level and bottom level are the sum of the computation and communication costs along the longest path from the given task to an entry task and an exit task, respectively. Again, the communication costs are assumed to be zero between two tasks mapped to the same processor. While the bottom level is statically computed at the beginning, the top level is computed incrementally during the scheduling process. The tasks are scheduled in the order of their priorities. The destination processor is either the processor from which the last message arrives, or a new processor depending on which the given task can start earlier. The time complexity of DSC is Ç Î µ ÐÓ Î µ. In the LLB algorithm, a task is mapped to a processor if there is at least another task from the same cluster scheduled on that processor and not mapped otherwise. LLB is a load balancing scheme. First, the destination processor is selected as the processor that becomes idle the earliest. Second, the task to be scheduled is selected. There are two candidates for the task to be scheduled: (a) a task already mapped to the selected processor having the least bottom level, or (b) an unmapped task with the least bottom level. The one starting the earliest is scheduled. The time complexity of LLB is Ç ÐÓ µ Î µ, where is the number of clusters obtained in the clustering step. DSC-LLB is a low-cost algorithm. Not surprisingly, compared to a higher cost scheduling algorithm as MCP, it has a worse scheduling performance. However, the DSC-LLB output performance is still shown to be within ¼± of the MCP output performance, while outperforming other known multi-step scheduling algorithms [7]. º½ Ì È Ð ÓÖ Ø Ñ Ê Ø ÓÒ Ð As mentioned earlier, list scheduling algorithms generally perform better compared to other scheduling algorithms for bounded number of processors, such as multi-step methods [7]. However, the list scheduling algorithms have a higher complexity compared to multi-step scheduling algorithms. Our goal is to reduce the time complexity of list scheduling algorithms, while maintaining their good results. Analyzing list scheduling algorithms, one can distinguish several steps. The first step is the task priorities computation, which takes at least Ç Î µ time, since the whole task graph has to be traversed. The second step, sorting tasks according to their priorities, takes Ç Î ÐÓ Î µ time. The third step, task scheduling, schedules the sorted tasks one at a time on the task s best processor. In list scheduling, usually the best processor is considered to be the processor on which the task to be scheduled starts the earliest (e.g., MCP, ETF). Computing the start times for all tasks requires traversing all tasks and edges, leading to a time complexity Ç Î µ. As each task is tentatively scheduled to all processors to find the earliest start time, the processor selection takes Ç Î µè µ time. Finally, scheduling the task to a processor only takes Ç ½µ time because the start time of the task to the selected processor was already computed in the previous step. Therefore the highest complexity steps of the list scheduling algorithms are the second and the third steps, which have Ç Î ÐÓ Î µ and Ç Î µè µ time complexity, respectively. As for practical problems has the same order of magnitude as Î, the second step usually has a higher cost. One first way to reduce the complexity of the task sorting step is not to sort all the tasks from the beginning, but to maintain only the ready tasks sorted throughout the scheduling process. However, despite the fact the sorting time is reduced, it still has the same complexity in the worst case: Ç Î ÐÓ Î µ. This complexity can be effectively reduced by maintaining only a constant size sorted list of ready tasks. The others are stored in an unsorted FIFO list which has an Ç ½µ time access. When a task becomes ready it is added to the sorted list when there is room to accommodate it, otherwise it is added to the FIFO list. For this reason, as long as the sorted list is not full, there can are tasks in the FIFO list. The tasks are always dequeued from the sorted list. After a dequeue operation, if the FIFO list is not empty, one task is moved to the sorted list. The time complexity of sorting tasks using a list of size À decreases to Ç Î ÐÓ Àµ as all the tasks are enqueued and dequeued in the sorted list only once. A possible drawback using a fixed size for a sorted task list is that the task with the highest priority may not be included in the sorted list, but is temporarily stored in the FIFO list. The size of the sorted list must therefore be large enough not to affect the performance of the algorithm too much. At the same time, it should not be too large in view of the time complexity. In our experiments, we find that a size of È is required to achieve a performance comparable to the original list scheduling algorithm (see Section 5). A sorted list size of È results in a task sorting complexity of Ç Î ÐÓ È µ. The Ç Î µè µ time complexity of list scheduling processor selection can also be reduced by restricting the choice for the destination processor from all processors to only two processors: (a) the processor from which the last message to the given task arrives or (b) the processor which becomes idle the earliest. In the appendix we prove that the start time of a given task is minimized by one of these two destination processors. The proof, is based on the fact that the start time of a task Ø on a candidate processor Ô is defined as the maximum between (a) the time the last message to Ø arrives from a different processor, and (b) the time Ô becomes idle. The task start time is minimized on one of the two processors that minimize these two components of the start time. As a consequence, there are two possible destination processors: (a) the processor from which the 3

4 last message to the given task arrives, because mapping the task on this processor is the only case in which the last message cost is zeroed, and (b) the processor becoming idle the earliest. This implies that restricting the selection to these two processors indeed does not affect the performance of the algorithm, while drastically reducing its complexity to Ç Î ÐÓ È µ µ. Using the described techniques for task sorting and processor selection the total complexity of the scheduling algorithm is decreased to Ç Î ÐÓ È µ µ, which is clearly a significant improvement over the typical time complexity Ç Î ÐÓ Î µ Î µè µ of the current list scheduling approach. º¾ Ì È Ð ÓÖ Ø Ñ The FCP algorithm is described in this section. It is based on three procedures AddReadyTask, SelectReadyTask and SelectProcessor which are described first. AddReadyTask (task) BEGIN IF size(priority_list) H THEN Enqueue_sorted(task,priority_list); ELSE Enqueue_FIFO(task,FIFO_list); END IF END AddReadyTask adds a ready task to the partially sorted ready task set. The task set is implemented as a fixed size priority list and a FIFO list. If the fixed size priority list is not full, the task is added to the priority list, otherwise to the FIFO list. SelectReadyTask () BEGIN task Dequeue(priority_list); IF FIFO_list is not empty THEN t Dequeue(FIFO_list); Enqueue_sorted(t,priority_list); END IF RETURN task; END SelectReadyTask returns the task with the highest priority from the priority list. The priority list must be full as long as there are tasks in the FIFO list. Therefore, if there exist tasks in the FIFO list after one task is dequeued from the priority list, one task from the FIFO list must be moved to the priority list. Using this approach, the priority list is always full if there exist tasks in the FIFO list. SelectProcessor (task) BEGIN pa processor from which the last message arrives; pb processor becoming idle the earliest; IF ST(task,pA) < ST(task,pB) THEN p pa; ELSE p pb; END IF RETURN p; END SelectProcessor is the processor selection procedure. The two processor candidates are (a) the processor from which the last message is received (pa) and (b) the processor becoming idle the earliest (pb). The first is used to reduce communication, while the latter is intended to ensure processor load balancing, The task start time is computed for both candidate processors. The one which determines the earliest start time is selected. FCP () BEGIN FOR t ¾ Î DO ComputePriority(t); IF t is an entry task THEN AddReadyTask(t); END IF END FOR WHILE NOT all tasks scheduled DO task SelectReadyTask(); p SelectProcesssor(task); ScheduleTask(task,p); FOR t ¾ new ready task set DO AddReadyTask(t); END FOR END WHILE END The FCP algorithm uses static task priorities, which are computed before the scheduling loop starts. Also, the ready task set is initialized with the entry tasks. The scheduling loop is repeated as long as there exist unscheduled tasks. At each iteration, one task is scheduled. The task to be scheduled is selected among ready tasks using SelectReadyTask described above. The destination processor for the given task is selected using SelectProcessor as described above. The task is scheduled on the selected processor after the last task scheduled on that processor. Before continuing with the following iteration, the ready task set is updated by adding the successors that become ready as a result of the current scheduling. The complexity of the FCP algorithm is as follows. Computing task priorities takes Ç Î µ. Each task is once added to and once removed from the partially sorted ready task set. Both operations take Ç ÐÓ Àµ for each task. As there are Î tasks, maintaining the task lists takes Ç Î ÐÓ Àµ. For each task, finding the processor from which the last message is received implies scanning all the tasks and edges in the task graphs, which takes Ç Î µ. Finding the processor becoming idle the earliest takes Ç ÐÓ È µ for each task, implying Ç Î ÐÓ È µ for all tasks. As a result, the total complexity of FCP is Ç Î ÐÓ Àµ ÐÓ È µµ µ. A priority list of size È yields good results, as indicated by the experiments, which implies a total complexity of Ç Î ÐÓ È µ µ. È Ö ÓÖÑ Ò Ê ÙÐØ The FCP algorithm is compared with the three algorithms described in Section 3, MCP, ETF, CPM and DSC-LLB. The four algorithms are well-known, use different scheduling schemes, and were shown to obtain competitive re- 4

5 T[ms] ETF MCP DSC-LLB CMP FCP LU Laplace Stencil FFT Figure 1: Miniature task graphs sults [6, 7, 10, 11]. We selected the lower-cost version of MCP, in which if there are more tasks with the same priority, the task to be scheduled is chosen randomly. For CPM, we also considered communication when computing task priorities. However, no improvement has been done on the processor selection scheme. For FCP, we use a priority list size of È when comparing it with the other algorithms (this choice is explained in Section 5.3). We consider task graphs representing various types of parallel algorithms. The selected problems are LU decomposition ( LU ), Laplace equation solver ( Laplace ), a stencil algorithm ( Stencil ) and Fast Fourier Transform ( FFT ). Miniature task graphs samples of each type are shown in Figure 1. For each of these problems, we adjusted the problem size to obtain task graphs of about ¾¼¼¼ nodes. For each problem, we varied the task graph granularities, by varying the communication to computation ratio ( Ê). The values used for Ê were 0.2, 1.0 and 5.0. For each problem and each Ê value, we generated 5 graphs with random execution times and communication delays (i.i.d. uniform distribution with unit coefficient of variation). º½ ÊÙÒÒ Ò Ì Ñ Our main objective is to reduce task scheduling cost (i.e., running time), while maintaining performance. In Fig. 2 the average running time of the algorithms is shown in CPU seconds as measured on a Pentium Pro/233MHz PC with 64Mb RAM running Linux ETF is the most costly among the compared algorithms. It increases from 2.5 s for 2 processors up to 65 s for 32 processors. MCP also has a runtime proportional with the number of processors, but its cost is significantly lower. For È ¾, it runs for 0.3 s, while for È ¾, the running time is 1.7 s. CPM is the fastest list scheduling algorithm in this particular comparison. Its running times increase logarithmically with È from 0.17 s for 2 processors to 0.28 s for 32 processors. DSC-LLB does not vary with È at all, as its most costly step, clustering, is independent of number of processors. The DSC- LLB running times vary around 0.7 s. FCP s running time is the lowest, comparable with CPM, and increases logarithmically with È from 0.17 s for 2 processors to 0.26 s for 32 processors. º¾ Figure 2: Scheduling algorithm cost comparison Ë ÙÐ Ò È Ö ÓÖÑ Ò In this section we show that the FCP algorithm achieves the same performance as the existing more expensive list scheduling algorithms. For performance comparison, we use the normalized schedule lengths (Æ ËÄ), which is defined as the ratio between the schedule length of the given algorithm and the schedule length of a reference algorithm. In Figure 3, we show the average normalized schedule lengths for the selected algorithms and problems where the reference algorithm is MCP. For each of the considered Ê values a set of ÆËÄ values is presented. Note that ÆËÄ generally increases with È as a result of the limited parallelism in the task graphs. MCP and ETF constantly yield relatively good schedules. Depending on the problem and granularity, either one or the other performs better. The differences are greater for finegrained problems, which are more sensitive to scheduling decisions. For LU, MCP schedules are up to ¾ ± better, while for Laplace and Stencil, ETF schedules are up to ½ ± better. For coarse-grain problems, the results are comparable. The only exception is LU, for which ETF performs up to ¾¼± worse. The third list scheduling algorithm, CPM, does not try to reduce communication costs, but only balances the load of the processors. Consequently, it has a very simple processor selection scheme which causes its low cost. However, the performance is even worsened when communication is considered, rising to more than a factor of 2 for coarse-grain LU on 32 processors. DSC-LLB is a multi-step method intended to obtain still good results while aiming for minimum complexity. Its scheduling performance is not much worse compared to MCP and ETF. Typically, the schedule lengths are no more than ¾¼± larger than the MCP and ETF schedule lengths. Although in some cases the difference can be higher (up to ¾±) there are also cases in which DSC-LLB performs better (up to ½¼±) compared to a list scheduling algorithm. FCP has both low-cost and good performance. Compared to the other provable low-cost algorithms, FCP has a consistently better performance, while the cost is lower. 5

6 .. CCR = 5.0 CCR = 1.0 CCR = 0.2 MCP ETF DSC-LLB CPM FCP LU LAPLACE STENCIL FFT Figure 3: Scheduling algorithms performance comparison S CCR = S CCR = S CCR = 0.2 FFT Stencil LU Laplace Figure 4: FCP speedup 1 1 6

7 . CCR = 5.0. CCR = 1.0. CCR = 0.2 À ¼ À È À È ¾ À È À ¾È Figure 5: The influence of priority list size to the FCP performance Compared to the more expensive two algorithms, MCP and ETF, one can note that FCP usually performs comparable to the better of the two. The only case in which FCP performs comparable with only the second is for fine-grained Stencil, in which ETF has a slightly better performance. Finally, in Figure 4 we show the FCP speedup for the considered problems. For all the problems, FCP obtains significant speedup. For coarse-grain problems, the speedup is almost linear, while for fine-grain problems, because of the limited parallelism available, the speedup starts levelling off earlier. For LU and Laplace, there are a large number of join operations. As a consequence, there is not much parallelism available and the speedup is lower. Stencil and FFT are more regular. Therefore more parallelism can be exploited and better speedup is obtained. º ÈÖ ÓÖ ØÝ Ä Ø Ë Þ Ë Ò Ø Ú ØÝ ÓÒ È Ö ÓÖÑ Ò As mentioned earlier, an important improvement of our algorithm is based on the fact that only a fixed amount of tasks need to be maintained sorted at each scheduling step. In Figure 5, we study the influence of the priority list size to the scheduling performance of FCP. As reference algorithm when calculating Æ ËÄ the FCP algorithm with a priority list of size È was selected. The results represent the mean values over the four considered problems. For small number of processors, when there is more parallelism to be exploited, FCP yields good results even for small sizes of the priority list. Even for a zero length priority list, which results in scheduling tasks while they become ready in a FIFO order, FCP still obtains good results. For larger number of processors, the parallelism in the problems decreases, and as a consequence the performance degrades for small sizes of the priority list size. However, one can note that for a priority list size greater than È, there is not much improvement obtained. If the number of ready tasks is greater than È, the scheduling process tends to become a load balancing scheme. The reason is that after mapping the first È ready tasks, the communication costs for the remaining ready tasks tends to be overlapped with the execution of the previous È mapped tasks. The task priorities, used to select the task with the least probable delay because of the communication costs, therefore become less important. As a result, a priority list with a smaller size can be used. For a small number of ready tasks at each scheduling step, the task priorities become more important. In this case, the tasks must be maintained sorted to obtain good performance. From the above experiments, it can be concluded that a priority list size of È is a good choice for the FCP algorithm. A smaller size penalizes problems with limited parallelism, while a greater size does not yield further improvements. ÓÒÐÙ ÓÒ In this paper, a new list scheduling algorithm, called Fast Critical Path (FCP), is presented. FCP is intended as a compile-time scheduling algorithm on a distributed-memory system. While similar to list scheduling algorithms, FCP has two important differences: (a) it does not sort the tasks at the beginning but maintains only a limited number of tasks sorted at any given time, and (b) instead of considering all processors as possible destination for a given task, the choice is restricted to either the processor from which the last messages to the given task arrives or the processor which becomes idle the earliest. By introducing this approach, the time complexity is reduced to Ç Î ÐÓ Àµµ ÐÓ È µµ µ. It is shown that a priority list size of È (À È ) yields good scheduling performance. As a result, the FCP complexity becomes Ç Î ÐÓ È µ µ. Experimental results show that compared with known scheduling algorithms, FCP obtains schedules comparable to the more expensive list scheduling algorithms, MCP and ETF. Yet, the FCP complexity is lower even when compared to low-cost scheduling algorithms like DSC-LLB or CPM which have a complexity of Ç Î µ ÐÓ Î µ and Ç Î ÐÓ Î µ ÐÓ È µµ, respectively, while FCP s scheduling performance is consistently better. In summary, despite its very low complexity, FCP outperforms other low-cost algorithms and even matches the better performing, higher-cost list scheduling algorithms. 7

8 Ê Ö Ò [1] I. Ahmad and Y.-K. Kwok. A new approach to scheduling parallel programs using task duplication. In Proc. ICPP, pages 47 51, Aug [2] M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP- Completeness. W. H. Freeman and Co., [3] R. L. Graham. Bounds on multiprocessing timing anomalies. SIAM J. on Applied Mathematics, 17(2): , Mar [4] J.-J. Hwang, Y.-C. Chow, F. D. Anger, and C.-Y. Lee. Scheduling precedence graphs in systems with interprocessor communication times. SIAM J. on Computing, 18: , Apr [5] B. Kruatrachue and T. G. Lewis. Grain size determination for parallel processing. IEEE Software, pages 23 32, Jan [6] Y.-K. Kwok and I. Ahmad. Benchmarking the task graph scheduling algorithms. In Proc. 1st Merged IPPS/SPDP, pages , Mar [7] A. Rădulescu, A. J. C. van Gemund, and H.-X. Lin. LLB: A fast and effective scheduling algorithm for distributed-memory systems. In Proc. 2nd Merged IPPS/SPDP, San Juan, Puerto Rico, Apr IEEE. [8] V. Sarkar. Partitioning and Scheduling Parallel Programs for Execution on Multiprocessors. PhD thesis, MIT, [9] B. Shirazi, M. Wang, and G. Pathak. Analysis and evaluation of heuristic methods for static task scheduling. J. of Parallel and Distributed Computing, 10(3): , Nov [10] M.-Y. Wu and D. D. Gajski. Hypertool: A programming aid for message-passing systems. IEEE Trans. on Parallel and Distributed Systems, 1(7): , July [11] T. Yang and A. Gerasoulis. DSC: Scheduling parallel tasks on an unbounded number of processors. IEEE Trans. on Parallel and Distributed Systems, 5(9): , Dec ÈÖÓ ÓÖ Ë Ð Ø ÓÒ In this appendix we prove that given a task, the choice of processors on which the task starts the earliest can be restricted to only two processors. The two possible destination processors are either (a) the processor from which the last message to the given task arrives or (b) the processor which becomes idle the earliest. The start time of a task Ø on a candidate processor Ô is defined as the maximum between (a) the time the last message to Ø arrives from a different processor, and (b) the time Ô becomes idle. The task start time is minimized on the two processors that minimize those two components of the start time. As a consequence, there are two possible destination processors: (a) the processor from which the last message to the given task arrives, because mapping the task on this processor is the only case in which the last message cost is zeroed, and (b) the processor becoming idle the earliest. This is formalized in the following. Definition 1 The arrival time of a message sent by a task Ø Ü to task Ø defined as: Ì Ñ Ø Ü Øµ Ì Ø Ü µ ÓÑÑ Ø Ü Øµ Definition 2 The time the last message arrives to a task Ø is defined as the maximum of the message arrivals times or 0 in the case the task is an entry task: Ì ÐÑ Øµ Ñ Ü Ñ Ü Ì Ñ Ø Ü Øµ ¼ ØÜ Øµ¾ Definition 3 Let Ø be defined as the task from which the last message arrives at task Ø: Ø Ø Ü ¾ Î Ì Ñ Ø Ü Øµ Ì ÐÑ Øµ Definition 4 Let Ô denote the processor from which the last message arrives: Ô È Ø µ Definition 5 The processor becoming idle the earliest is: Ô Ô ¾ È È ÁÌ Ôµ Ñ Ò ÔÜ¾È È ÁÌ Ô Üµ Definition 6 The start time of a tentative scheduling of task Ø to a processor Ô is defined as: Ì ËÌ Ø Ôµ Ñ Ü Ñ Ü Ì Ñ Ø Ü Øµ È ÁÌ Ôµ ØÜ Øµ¾ È Øܵ Ô Theorem 1 Let Ø be the task to be scheduled, and Ô denote È Øµ such that Ì ËÌ Ø Ôµ Ñ Ò ÔÜ¾È Ì ËÌ Ø Ô Üµ Then Ô ¾ Ô Ô. Proof Ô Ý Ô Ì Ñ Ø Øµ ¾ Ì Ñ Ø Ü Øµ Ø Ü Øµ ¾ È Ø Ü µ Ô Ý which implies Ô Ý Ô Ì Ñ Ø Øµ Ñ Ü Ì Ñ Ø Ü Øµ ØÜ Øµ¾ È Øܵ ÔÝ From Definitions 2 and 3, it follows that Ô Ý Ô Ì ÐÑ Ì Ñ Ø Øµ As a result, Ô Ý Ô Ñ Ü Ì Ñ Ø Ü Øµ Ì ØÜ Øµ¾ È Øܵ ÔÝ Ñ Ü Ì Ñ Ø Ü Øµ Ì ÐÑ ØÜ Øµ¾ È Øܵ ÔÝ which is the maximum value according to Definition 2. As a consequence, the only processor this first term of Ì ËÌ Ø Ôµ can be decreased below Ì ÐÑ is Ô. This implies that the first term of Ì ËÌ Ø Ôµ is minimized for Ô. According to Definition 5, the second term of Ì ËÌ Ø Ôµ is minimized for Ô. As the two terms of Ì ËÌ Ø Ôµ are minimized either for Ô or Ô it follows that Ô ¾ Ô Ô. ¾ 8

A Level-wise Priority Based Task Scheduling for Heterogeneous Systems

A Level-wise Priority Based Task Scheduling for Heterogeneous Systems International Journal of Information and Education Technology, Vol., No. 5, December A Level-wise Priority Based Task Scheduling for Heterogeneous Systems R. Eswari and S. Nickolas, Member IACSIT Abstract

More information

Critical Path Scheduling Parallel Programs on an Unbounded Number of Processors

Critical Path Scheduling Parallel Programs on an Unbounded Number of Processors Critical Path Scheduling Parallel Programs on an Unbounded Number of Processors Mourad Hakem, Franck Butelle To cite this version: Mourad Hakem, Franck Butelle. Critical Path Scheduling Parallel Programs

More information

Scan Scheduling Specification and Analysis

Scan Scheduling Specification and Analysis Scan Scheduling Specification and Analysis Bruno Dutertre System Design Laboratory SRI International Menlo Park, CA 94025 May 24, 2000 This work was partially funded by DARPA/AFRL under BAE System subcontract

More information

Probabilistic analysis of algorithms: What s it good for?

Probabilistic analysis of algorithms: What s it good for? Probabilistic analysis of algorithms: What s it good for? Conrado Martínez Univ. Politècnica de Catalunya, Spain February 2008 The goal Given some algorithm taking inputs from some set Á, we would like

More information

Exploiting Duplication to Minimize the Execution Times of Parallel Programs on Message-Passing Systems

Exploiting Duplication to Minimize the Execution Times of Parallel Programs on Message-Passing Systems Exploiting Duplication to Minimize the Execution Times of Parallel Programs on Message-Passing Systems Yu-Kwong Kwok and Ishfaq Ahmad Department of Computer Science Hong Kong University of Science and

More information

Contention-Aware Scheduling with Task Duplication

Contention-Aware Scheduling with Task Duplication Contention-Aware Scheduling with Task Duplication Oliver Sinnen, Andrea To, Manpreet Kaur Department of Electrical and Computer Engineering, University of Auckland Private Bag 92019, Auckland 1142, New

More information

Designing Networks Incrementally

Designing Networks Incrementally Designing Networks Incrementally Adam Meyerson Kamesh Munagala Ý Serge Plotkin Þ Abstract We consider the problem of incrementally designing a network to route demand to a single sink on an underlying

More information

Response Time Analysis of Asynchronous Real-Time Systems

Response Time Analysis of Asynchronous Real-Time Systems Response Time Analysis of Asynchronous Real-Time Systems Guillem Bernat Real-Time Systems Research Group Department of Computer Science University of York York, YO10 5DD, UK Technical Report: YCS-2002-340

More information

Worst-Case Utilization Bound for EDF Scheduling on Real-Time Multiprocessor Systems

Worst-Case Utilization Bound for EDF Scheduling on Real-Time Multiprocessor Systems Worst-Case Utilization Bound for EDF Scheduling on Real-Time Multiprocessor Systems J.M. López, M. García, J.L. Díaz, D.F. García University of Oviedo Department of Computer Science Campus de Viesques,

More information

Competitive Analysis of On-line Algorithms for On-demand Data Broadcast Scheduling

Competitive Analysis of On-line Algorithms for On-demand Data Broadcast Scheduling Competitive Analysis of On-line Algorithms for On-demand Data Broadcast Scheduling Weizhen Mao Department of Computer Science The College of William and Mary Williamsburg, VA 23187-8795 USA wm@cs.wm.edu

More information

On the Performance of Greedy Algorithms in Packet Buffering

On the Performance of Greedy Algorithms in Packet Buffering On the Performance of Greedy Algorithms in Packet Buffering Susanne Albers Ý Markus Schmidt Þ Abstract We study a basic buffer management problem that arises in network switches. Consider input ports,

More information

A Comparison of Task-Duplication-Based Algorithms for Scheduling Parallel Programs to Message-Passing Systems

A Comparison of Task-Duplication-Based Algorithms for Scheduling Parallel Programs to Message-Passing Systems A Comparison of Task-Duplication-Based s for Scheduling Parallel Programs to Message-Passing Systems Ishfaq Ahmad and Yu-Kwong Kwok Department of Computer Science The Hong Kong University of Science and

More information

SFU CMPT Lecture: Week 8

SFU CMPT Lecture: Week 8 SFU CMPT-307 2008-2 1 Lecture: Week 8 SFU CMPT-307 2008-2 Lecture: Week 8 Ján Maňuch E-mail: jmanuch@sfu.ca Lecture on June 24, 2008, 5.30pm-8.20pm SFU CMPT-307 2008-2 2 Lecture: Week 8 Universal hashing

More information

Online Facility Location

Online Facility Location Online Facility Location Adam Meyerson Abstract We consider the online variant of facility location, in which demand points arrive one at a time and we must maintain a set of facilities to service these

More information

Efficiency versus Convergence of Boolean Kernels for On-Line Learning Algorithms

Efficiency versus Convergence of Boolean Kernels for On-Line Learning Algorithms Efficiency versus Convergence of Boolean Kernels for On-Line Learning Algorithms Roni Khardon Tufts University Medford, MA 02155 roni@eecs.tufts.edu Dan Roth University of Illinois Urbana, IL 61801 danr@cs.uiuc.edu

More information

Mapping of Parallel Tasks to Multiprocessors with Duplication *

Mapping of Parallel Tasks to Multiprocessors with Duplication * Mapping of Parallel Tasks to Multiprocessors with Duplication * Gyung-Leen Park Dept. of Comp. Sc. and Eng. Univ. of Texas at Arlington Arlington, TX 76019-0015 gpark@cse.uta.edu Behrooz Shirazi Dept.

More information

A Parallel Algorithm for Compile-Time Scheduling of Parallel Programs on Multiprocessors

A Parallel Algorithm for Compile-Time Scheduling of Parallel Programs on Multiprocessors A Parallel for Compile-Time Scheduling of Parallel Programs on Multiprocessors Yu-Kwong Kwok and Ishfaq Ahmad Email: {csricky, iahmad}@cs.ust.hk Department of Computer Science The Hong Kong University

More information

Online Scheduling for Sorting Buffers

Online Scheduling for Sorting Buffers Online Scheduling for Sorting Buffers Harald Räcke ½, Christian Sohler ½, and Matthias Westermann ¾ ½ Heinz Nixdorf Institute and Department of Mathematics and Computer Science Paderborn University, D-33102

More information

A Modified Genetic Algorithm for Task Scheduling in Multiprocessor Systems

A Modified Genetic Algorithm for Task Scheduling in Multiprocessor Systems A Modified Genetic Algorithm for Task Scheduling in Multiprocessor Systems Yi-Hsuan Lee and Cheng Chen Department of Computer Science and Information Engineering National Chiao Tung University, Hsinchu,

More information

Scheduling on clusters and grids

Scheduling on clusters and grids Some basics on scheduling theory Grégory Mounié, Yves Robert et Denis Trystram ID-IMAG 6 mars 2006 Some basics on scheduling theory 1 Some basics on scheduling theory Notations and Definitions List scheduling

More information

An Experimental Investigation into the Rank Function of the Heterogeneous Earliest Finish Time Scheduling Algorithm

An Experimental Investigation into the Rank Function of the Heterogeneous Earliest Finish Time Scheduling Algorithm An Experimental Investigation into the Rank Function of the Heterogeneous Earliest Finish Time Scheduling Algorithm Henan Zhao and Rizos Sakellariou Department of Computer Science, University of Manchester,

More information

From Static to Dynamic Routing: Efficient Transformations of Store-and-Forward Protocols

From Static to Dynamic Routing: Efficient Transformations of Store-and-Forward Protocols From Static to Dynamic Routing: Efficient Transformations of Store-and-Forward Protocols Christian Scheideler Ý Berthold Vöcking Þ Abstract We investigate how static store-and-forward routing algorithms

More information

Computing optimal linear layouts of trees in linear time

Computing optimal linear layouts of trees in linear time Computing optimal linear layouts of trees in linear time Konstantin Skodinis University of Passau, 94030 Passau, Germany, e-mail: skodinis@fmi.uni-passau.de Abstract. We present a linear time algorithm

More information

Parallel Code Generation in MathModelica / An Object Oriented Component Based Simulation Environment

Parallel Code Generation in MathModelica / An Object Oriented Component Based Simulation Environment Parallel Code Generation in MathModelica / An Object Oriented Component Based Simulation Environment Peter Aronsson, Peter Fritzson (petar,petfr)@ida.liu.se Dept. of Computer and Information Science, Linköping

More information

Reliability and Scheduling on Systems Subject to Failures

Reliability and Scheduling on Systems Subject to Failures Reliability and Scheduling on Systems Subject to Failures Mourad Hakem and Franck Butelle LIPN CNRS UMR 7030 Université Paris Nord Av. J.B. Clément 93430 Villetaneuse France {Mourad.Hakem,Franck.Butelle}@lipn.univ-paris3.fr

More information

Constructive floorplanning with a yield objective

Constructive floorplanning with a yield objective Constructive floorplanning with a yield objective Rajnish Prasad and Israel Koren Department of Electrical and Computer Engineering University of Massachusetts, Amherst, MA 13 E-mail: rprasad,koren@ecs.umass.edu

More information

Benchmarking and Comparison of the Task Graph Scheduling Algorithms

Benchmarking and Comparison of the Task Graph Scheduling Algorithms Benchmarking and Comparison of the Task Graph Scheduling Algorithms Yu-Kwong Kwok 1 and Ishfaq Ahmad 2 1 Department of Electrical and Electronic Engineering The University of Hong Kong, Pokfulam Road,

More information

Instruction Scheduling. Software Pipelining - 3

Instruction Scheduling. Software Pipelining - 3 Instruction Scheduling and Software Pipelining - 3 Department of Computer Science and Automation Indian Institute of Science Bangalore 560 012 NPTEL Course on Principles of Compiler Design Instruction

More information

Event List Management In Distributed Simulation

Event List Management In Distributed Simulation Event List Management In Distributed Simulation Jörgen Dahl ½, Malolan Chetlur ¾, and Philip A Wilsey ½ ½ Experimental Computing Laboratory, Dept of ECECS, PO Box 20030, Cincinnati, OH 522 0030, philipwilsey@ieeeorg

More information

The Online Median Problem

The Online Median Problem The Online Median Problem Ramgopal R. Mettu C. Greg Plaxton November 1999 Abstract We introduce a natural variant of the (metric uncapacitated) -median problem that we call the online median problem. Whereas

More information

Multiprocessor Scheduling Using Task Duplication Based Scheduling Algorithms: A Review Paper

Multiprocessor Scheduling Using Task Duplication Based Scheduling Algorithms: A Review Paper Multiprocessor Scheduling Using Task Duplication Based Scheduling Algorithms: A Review Paper Ravneet Kaur 1, Ramneek Kaur 2 Department of Computer Science Guru Nanak Dev University, Amritsar, Punjab, 143001,

More information

RSA (Rivest Shamir Adleman) public key cryptosystem: Key generation: Pick two large prime Ô Õ ¾ numbers È.

RSA (Rivest Shamir Adleman) public key cryptosystem: Key generation: Pick two large prime Ô Õ ¾ numbers È. RSA (Rivest Shamir Adleman) public key cryptosystem: Key generation: Pick two large prime Ô Õ ¾ numbers È. Let Ò Ô Õ. Pick ¾ ½ ³ Òµ ½ so, that ³ Òµµ ½. Let ½ ÑÓ ³ Òµµ. Public key: Ò µ. Secret key Ò µ.

More information

SFU CMPT Lecture: Week 9

SFU CMPT Lecture: Week 9 SFU CMPT-307 2008-2 1 Lecture: Week 9 SFU CMPT-307 2008-2 Lecture: Week 9 Ján Maňuch E-mail: jmanuch@sfu.ca Lecture on July 8, 2008, 5.30pm-8.20pm SFU CMPT-307 2008-2 2 Lecture: Week 9 Binary search trees

More information

Disjoint, Partition and Intersection Constraints for Set and Multiset Variables

Disjoint, Partition and Intersection Constraints for Set and Multiset Variables Disjoint, Partition and Intersection Constraints for Set and Multiset Variables Christian Bessiere ½, Emmanuel Hebrard ¾, Brahim Hnich ¾, and Toby Walsh ¾ ¾ ½ LIRMM, Montpelier, France. Ö Ð ÖÑÑ Ö Cork

More information

Introduction Traditionally, the parallelization of deterministic mesh sweeps that arise in particle-transport computations has been viewed as an appli

Introduction Traditionally, the parallelization of deterministic mesh sweeps that arise in particle-transport computations has been viewed as an appli Task Scheduling and Parallel Mesh-Sweeps in Transport Computations Nancy M. Amato amato@cs.tamu.edu Ping An pinga@cs.tamu.edu Technical Report 00-009 Department of Computer Science Texas A&M University

More information

Approximation by NURBS curves with free knots

Approximation by NURBS curves with free knots Approximation by NURBS curves with free knots M Randrianarivony G Brunnett Technical University of Chemnitz, Faculty of Computer Science Computer Graphics and Visualization Straße der Nationen 6, 97 Chemnitz,

More information

A SIMULATION OF POWER-AWARE SCHEDULING OF TASK GRAPHS TO MULTIPLE PROCESSORS

A SIMULATION OF POWER-AWARE SCHEDULING OF TASK GRAPHS TO MULTIPLE PROCESSORS A SIMULATION OF POWER-AWARE SCHEDULING OF TASK GRAPHS TO MULTIPLE PROCESSORS Xiaojun Qi, Carson Jones, and Scott Cannon Computer Science Department Utah State University, Logan, UT, USA 84322-4205 xqi@cc.usu.edu,

More information

LIST BASED SCHEDULING ALGORITHM FOR HETEROGENEOUS SYSYTEM

LIST BASED SCHEDULING ALGORITHM FOR HETEROGENEOUS SYSYTEM LIST BASED SCHEDULING ALGORITHM FOR HETEROGENEOUS SYSYTEM C. Subramanian 1, N.Rajkumar 2, S. Karthikeyan 3, Vinothkumar 4 1 Assoc.Professor, Department of Computer Applications, Dr. MGR Educational and

More information

Optimal Time Bounds for Approximate Clustering

Optimal Time Bounds for Approximate Clustering Optimal Time Bounds for Approximate Clustering Ramgopal R. Mettu C. Greg Plaxton Department of Computer Science University of Texas at Austin Austin, TX 78712, U.S.A. ramgopal, plaxton@cs.utexas.edu Abstract

More information

RSA (Rivest Shamir Adleman) public key cryptosystem: Key generation: Pick two large prime Ô Õ ¾ numbers È.

RSA (Rivest Shamir Adleman) public key cryptosystem: Key generation: Pick two large prime Ô Õ ¾ numbers È. RSA (Rivest Shamir Adleman) public key cryptosystem: Key generation: Pick two large prime Ô Õ ¾ numbers È. Let Ò Ô Õ. Pick ¾ ½ ³ Òµ ½ so, that ³ Òµµ ½. Let ½ ÑÓ ³ Òµµ. Public key: Ò µ. Secret key Ò µ.

More information

Parallel Program Execution on a Heterogeneous PC Cluster Using Task Duplication

Parallel Program Execution on a Heterogeneous PC Cluster Using Task Duplication Parallel Program Execution on a Heterogeneous PC Cluster Using Task Duplication YU-KWONG KWOK Department of Electrical and Electronic Engineering The University of Hong Kong, Pokfulam Road, Hong Kong Email:

More information

Cofactoring-Based Upper Bound Computation for Covering Problems

Cofactoring-Based Upper Bound Computation for Covering Problems TR-CSE-98-06, UNIVERSITY OF MASSACHUSETTS AMHERST Cofactoring-Based Upper Bound Computation for Covering Problems Congguang Yang Maciej Ciesielski May 998 TR-CSE-98-06 Department of Electrical and Computer

More information

Graph Traversal. 1 Breadth First Search. Correctness. find all nodes reachable from some source node s

Graph Traversal. 1 Breadth First Search. Correctness. find all nodes reachable from some source node s 1 Graph Traversal 1 Breadth First Search visit all nodes and edges in a graph systematically gathering global information find all nodes reachable from some source node s Prove this by giving a minimum

More information

End-to-end bandwidth guarantees through fair local spectrum share in wireless ad-hoc networks

End-to-end bandwidth guarantees through fair local spectrum share in wireless ad-hoc networks End-to-end bandwidth guarantees through fair local spectrum share in wireless ad-hoc networks Saswati Sarkar and Leandros Tassiulas 1 Abstract Sharing the locally common spectrum among the links of the

More information

Heuristic Algorithms for Multiconstrained Quality-of-Service Routing

Heuristic Algorithms for Multiconstrained Quality-of-Service Routing 244 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL 10, NO 2, APRIL 2002 Heuristic Algorithms for Multiconstrained Quality-of-Service Routing Xin Yuan, Member, IEEE Abstract Multiconstrained quality-of-service

More information

Efficient Algorithms for Scheduling and Mapping of Parallel Programs onto Parallel Architectures

Efficient Algorithms for Scheduling and Mapping of Parallel Programs onto Parallel Architectures Efficient Algorithms for Scheduling and Mapping of Parallel Programs onto Parallel Architectures By Yu-Kwong KWOK A Thesis Presented to The Hong Kong University of Science and Technology in Partial Fulfilment

More information

Scheduling Directed A-cyclic Task Graphs on a Bounded Set of Heterogeneous Processors Using Task Duplication

Scheduling Directed A-cyclic Task Graphs on a Bounded Set of Heterogeneous Processors Using Task Duplication Scheduling Directed A-cyclic Task Graphs on a Bounded Set of Heterogeneous Processors Using Task Duplication Sanjeev Baskiyar, Ph.D. & Christopher Dickinson Department of Computer Science and Software

More information

Efficient Techniques for Clustering and Scheduling onto Embedded Multiprocessors

Efficient Techniques for Clustering and Scheduling onto Embedded Multiprocessors IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 17, NO. 7, JULY 2006 667 Efficient Techniques for Clustering and Scheduling onto Embedded Multiprocessors Vida Kianzad, Student Member, IEEE,

More information

Expected Runtimes of Evolutionary Algorithms for the Eulerian Cycle Problem

Expected Runtimes of Evolutionary Algorithms for the Eulerian Cycle Problem Expected Runtimes of Evolutionary Algorithms for the Eulerian Cycle Problem Frank Neumann Institut für Informatik und Praktische Mathematik Christian-Albrechts-Universität zu Kiel 24098 Kiel, Germany fne@informatik.uni-kiel.de

More information

Directed Single Source Shortest Paths in Linear Average Case Time

Directed Single Source Shortest Paths in Linear Average Case Time Directed Single Source Shortest Paths in inear Average Case Time Ulrich Meyer MPI I 2001 1-002 May 2001 Author s Address ÍÐÖ ÅÝÖ ÅܹÈÐÒ¹ÁÒ ØØÙØ ĐÙÖ ÁÒÓÖÑØ ËØÙÐ ØÞÒÙ Û ½¾ ËÖÖĐÙÒ umeyer@mpi-sb.mpg.de www.uli-meyer.de

More information

A middle ground between CAMs and DAGs for high-speed packet classification

A middle ground between CAMs and DAGs for high-speed packet classification A middle ground between CAMs and DAGs for high-speed packet classification Amit Prakash Adnan Aziz The University of Texas at Austin Abstract Packet classification is a computationally intensive task that

More information

Ulysses: A Robust, Low-Diameter, Low-Latency Peer-to-Peer Network

Ulysses: A Robust, Low-Diameter, Low-Latency Peer-to-Peer Network 1 Ulysses: A Robust, Low-Diameter, Low-Latency Peer-to-Peer Network Abhishek Kumar Shashidhar Merugu Jun (Jim) Xu Xingxing Yu College of Computing, School of Mathematics, Georgia Institute of Technology,

More information

Provably Efficient Non-Preemptive Task Scheduling with Cilk

Provably Efficient Non-Preemptive Task Scheduling with Cilk Provably Efficient Non-Preemptive Task Scheduling with Cilk V. -Y. Vee and W.-J. Hsu School of Applied Science, Nanyang Technological University Nanyang Avenue, Singapore 639798. Abstract We consider the

More information

Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs

Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs B. Barla Cambazoglu and Cevdet Aykanat Bilkent University, Department of Computer Engineering, 06800, Ankara, Turkey {berkant,aykanat}@cs.bilkent.edu.tr

More information

Computation of the multivariate Oja median

Computation of the multivariate Oja median Metrika manuscript No. (will be inserted by the editor) Computation of the multivariate Oja median T. Ronkainen, H. Oja, P. Orponen University of Jyväskylä, Department of Mathematics and Statistics, Finland

More information

Bi-Objective Optimization for Scheduling in Heterogeneous Computing Systems

Bi-Objective Optimization for Scheduling in Heterogeneous Computing Systems Bi-Objective Optimization for Scheduling in Heterogeneous Computing Systems Tony Maciejewski, Kyle Tarplee, Ryan Friese, and Howard Jay Siegel Department of Electrical and Computer Engineering Colorado

More information

DSC: Scheduling Parallel Tasks on an Unbounded Number of. Processors 3. Tao Yang and Apostolos Gerasoulis. Department of Computer Science

DSC: Scheduling Parallel Tasks on an Unbounded Number of. Processors 3. Tao Yang and Apostolos Gerasoulis. Department of Computer Science DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors 3 Tao Yang and Apostolos Gerasoulis Department of Computer Science Rutgers University New Brunswick, NJ 08903 Email: ftyang, gerasoulisg@cs.rutgers.edu

More information

A Novel Task Scheduling Algorithm for Heterogeneous Computing

A Novel Task Scheduling Algorithm for Heterogeneous Computing A Novel Task Scheduling Algorithm for Heterogeneous Computing Vinay Kumar C. P.Katti P. C. Saxena SC&SS SC&SS SC&SS Jawaharlal Nehru University Jawaharlal Nehru University Jawaharlal Nehru University New

More information

A Genetic Algorithm for Multiprocessor Task Scheduling

A Genetic Algorithm for Multiprocessor Task Scheduling A Genetic Algorithm for Multiprocessor Task Scheduling Tashniba Kaiser, Olawale Jegede, Ken Ferens, Douglas Buchanan Dept. of Electrical and Computer Engineering, University of Manitoba, Winnipeg, MB,

More information

Complementary Graph Coloring

Complementary Graph Coloring International Journal of Computer (IJC) ISSN 2307-4523 (Print & Online) Global Society of Scientific Research and Researchers http://ijcjournal.org/ Complementary Graph Coloring Mohamed Al-Ibrahim a*,

More information

A sharp threshold in proof complexity yields lower bounds for satisfiability search

A sharp threshold in proof complexity yields lower bounds for satisfiability search A sharp threshold in proof complexity yields lower bounds for satisfiability search Dimitris Achlioptas Microsoft Research One Microsoft Way Redmond, WA 98052 optas@microsoft.com Michael Molloy Ý Department

More information

External-Memory Breadth-First Search with Sublinear I/O

External-Memory Breadth-First Search with Sublinear I/O External-Memory Breadth-First Search with Sublinear I/O Kurt Mehlhorn and Ulrich Meyer Max-Planck-Institut für Informatik Stuhlsatzenhausweg 85, 66123 Saarbrücken, Germany. Abstract. Breadth-first search

More information

Control-Flow Graph and. Local Optimizations

Control-Flow Graph and. Local Optimizations Control-Flow Graph and - Part 2 Department of Computer Science and Automation Indian Institute of Science Bangalore 560 012 NPTEL Course on Principles of Compiler Design Outline of the Lecture What is

More information

Controlled duplication for scheduling real-time precedence tasks on heterogeneous multiprocessors

Controlled duplication for scheduling real-time precedence tasks on heterogeneous multiprocessors Controlled duplication for scheduling real-time precedence tasks on heterogeneous multiprocessors Jagpreet Singh* and Nitin Auluck Department of Computer Science & Engineering Indian Institute of Technology,

More information

ECE608 - Chapter 16 answers

ECE608 - Chapter 16 answers ¼ À ÈÌ Ê ½ ÈÊÇ Ä ÅË ½µ ½ º½¹ ¾µ ½ º½¹ µ ½ º¾¹½ µ ½ º¾¹¾ µ ½ º¾¹ µ ½ º ¹ µ ½ º ¹ µ ½ ¹½ ½ ECE68 - Chapter 6 answers () CLR 6.-4 Let S be the set of n activities. The obvious solution of using Greedy-Activity-

More information

ISHFAQ AHMAD 1 AND YU-KWONG KWOK 2

ISHFAQ AHMAD 1 AND YU-KWONG KWOK 2 Optimal and Near-Optimal Allocation of Precedence-Constrained Tasks to Parallel Processors: Defying the High Complexity Using Effective Search Techniques Abstract Obtaining an optimal schedule for a set

More information

Structure and Complexity in Planning with Unary Operators

Structure and Complexity in Planning with Unary Operators Structure and Complexity in Planning with Unary Operators Carmel Domshlak and Ronen I Brafman ½ Abstract In this paper we study the complexity of STRIPS planning when operators have a single effect In

More information

HEURISTIC BASED TASK SCHEDULING IN MULTIPROCESSOR SYSTEMS WITH GENETIC ALGORITHM BY CHOOSING THE ELIGIBLE PROCESSOR

HEURISTIC BASED TASK SCHEDULING IN MULTIPROCESSOR SYSTEMS WITH GENETIC ALGORITHM BY CHOOSING THE ELIGIBLE PROCESSOR HEURISTIC BASED TASK SCHEDULING IN MULTIPROCESSOR SYSTEMS WITH GENETIC ALGORITHM BY CHOOSING THE ELIGIBLE PROCESSOR Probir Roy 1, Md. Mejbah Ul Alam 1 and Nishita Das 2 1 Bangladesh University of Engineering

More information

Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs

Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National Technical University of Athens School of Electrical

More information

A Comparison of Structural CSP Decomposition Methods

A Comparison of Structural CSP Decomposition Methods A Comparison of Structural CSP Decomposition Methods Georg Gottlob Institut für Informationssysteme, Technische Universität Wien, A-1040 Vienna, Austria. E-mail: gottlob@dbai.tuwien.ac.at Nicola Leone

More information

Cello: A Disk Scheduling Framework for Next Generation Operating Systems

Cello: A Disk Scheduling Framework for Next Generation Operating Systems Cello: A Disk Scheduling Framework for Next Generation Operating Systems Prashant Shenoy Ý Harrick M. Vin Department of Computer Science, Department of Computer Sciences, University of Massachusetts at

More information

A Fast Recursive Mapping Algorithm. Department of Computer and Information Science. New Jersey Institute of Technology.

A Fast Recursive Mapping Algorithm. Department of Computer and Information Science. New Jersey Institute of Technology. A Fast Recursive Mapping Algorithm Song Chen and Mary M. Eshaghian Department of Computer and Information Science New Jersey Institute of Technology Newark, NJ 7 Abstract This paper presents a generic

More information

Fundamental Trade-offs in Aggregate Packet Scheduling

Fundamental Trade-offs in Aggregate Packet Scheduling Fundamental Trade-offs in Aggregate Packet Scheduling Zhi-Li Zhang Ý, Zhenhai Duan Ý and Yiwei Thomas Hou Þ Ý Dept. of Computer Science & Engineering Þ Fujitsu Labs of America University of Minnesota 595

More information

Adaptive techniques for spline collocation

Adaptive techniques for spline collocation Adaptive techniques for spline collocation Christina C. Christara and Kit Sun Ng Department of Computer Science University of Toronto Toronto, Ontario M5S 3G4, Canada ccc,ngkit @cs.utoronto.ca July 18,

More information

Mobile Agent Rendezvous in a Ring

Mobile Agent Rendezvous in a Ring Mobile Agent Rendezvous in a Ring Evangelos Kranakis Danny Krizanc Ý Nicola Santoro Cindy Sawchuk Abstract In the rendezvous search problem, two mobile agents must move along the Ò nodes of a network so

More information

Comparing Data Compression in Web-based Animation Models using Kolmogorov Complexity

Comparing Data Compression in Web-based Animation Models using Kolmogorov Complexity Comparing Data Compression in Web-based Animation Models using Kolmogorov Complexity Carlos A. P. Campani, Fernando Accorsi, Paulo Blauth Menezes and Luciana Porcher Nedel Abstract In the last few years,

More information

On the Max Coloring Problem

On the Max Coloring Problem On the Max Coloring Problem Leah Epstein Asaf Levin May 22, 2010 Abstract We consider max coloring on hereditary graph classes. The problem is defined as follows. Given a graph G = (V, E) and positive

More information

Models, Notation, Goals

Models, Notation, Goals Scope Ë ÕÙ Ò Ð Ò ÐÝ Ó ÝÒ Ñ ÑÓ Ð Ü Ô Ö Ñ Ö ² Ñ ¹Ú ÖÝ Ò Ú Ö Ð Ö ÒÙÑ Ö Ð ÔÓ Ö ÓÖ ÔÔÖÓÜ Ñ ÓÒ ß À ÓÖ Ð Ô Ö Ô Ú ß Ë ÑÙÐ ÓÒ Ñ Ó ß ËÑÓÓ Ò ² Ö Ò Ö Ò Ô Ö Ñ Ö ÑÔÐ ß Ã ÖÒ Ð Ñ Ó ÚÓÐÙ ÓÒ Ñ Ó ÓÑ Ò Ô Ö Ð Ð Ö Ò Ð ÓÖ Ñ

More information

A Framework for Space and Time Efficient Scheduling of Parallelism

A Framework for Space and Time Efficient Scheduling of Parallelism A Framework for Space and Time Efficient Scheduling of Parallelism Girija J. Narlikar Guy E. Blelloch December 996 CMU-CS-96-97 School of Computer Science Carnegie Mellon University Pittsburgh, PA 523

More information

Optimal Scheduling for UET-UCT Generalized n-dimensional Grid Task Graphs =

Optimal Scheduling for UET-UCT Generalized n-dimensional Grid Task Graphs = Optimal Scheduling for UET-UCT Generalized n-dimensional Grid Task Graphs = Theodore Andronikos, Nectarios Koziris, George Papakonstantinou and Panayiotis Tsanakas National Technical University of Athens

More information

Increasing Parallelism of Loops with the Loop Distribution Technique

Increasing Parallelism of Loops with the Loop Distribution Technique Increasing Parallelism of Loops with the Loop Distribution Technique Ku-Nien Chang and Chang-Biau Yang Department of pplied Mathematics National Sun Yat-sen University Kaohsiung, Taiwan 804, ROC cbyang@math.nsysu.edu.tw

More information

On Clusterings Good, Bad and Spectral

On Clusterings Good, Bad and Spectral On Clusterings Good, Bad and Spectral Ravi Kannan Computer Science, Yale University. kannan@cs.yale.edu Santosh Vempala Ý Mathematics, M.I.T. vempala@math.mit.edu Adrian Vetta Þ Mathematics, M.I.T. avetta@math.mit.edu

More information

Adaptive and Incremental Processing for Distance Join Queries

Adaptive and Incremental Processing for Distance Join Queries Adaptive and Incremental Processing for Distance Join Queries Hyoseop Shin Ý Bongki Moon Þ Sukho Lee Ý Ý Þ School of Computer Engineering Department of Computer Science Seoul National University University

More information

Using SmartXplorer to achieve timing closure

Using SmartXplorer to achieve timing closure Using SmartXplorer to achieve timing closure The main purpose of Xilinx SmartXplorer is to achieve timing closure where the default place-and-route (PAR) strategy results in a near miss. It can be much

More information

Indexing Mobile Objects Using Dual Transformations

Indexing Mobile Objects Using Dual Transformations Indexing Mobile Objects Using Dual Transformations George Kollios Boston University gkollios@cs.bu.edu Dimitris Papadopoulos UC Riverside tsotras@cs.ucr.edu Dimitrios Gunopulos Ý UC Riverside dg@cs.ucr.edu

More information

Graph Based Communication Analysis for Hardware/Software Codesign

Graph Based Communication Analysis for Hardware/Software Codesign Graph Based Communication Analysis for Hardware/Software Codesign Peter Voigt Knudsen and Jan Madsen Department of Information Technology, Technical University of Denmark pvk@it.dtu.dk, jan@it.dtu.dk ØÖ

More information

This file contains an excerpt from the character code tables and list of character names for The Unicode Standard, Version 3.0.

This file contains an excerpt from the character code tables and list of character names for The Unicode Standard, Version 3.0. Range: This file contains an excerpt from the character code tables and list of character names for The Unicode Standard, Version.. isclaimer The shapes of the reference glyphs used in these code charts

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

Preemptive Scheduling of Equal-Length Jobs in Polynomial Time

Preemptive Scheduling of Equal-Length Jobs in Polynomial Time Preemptive Scheduling of Equal-Length Jobs in Polynomial Time George B. Mertzios and Walter Unger Abstract. We study the preemptive scheduling problem of a set of n jobs with release times and equal processing

More information

Frame-Based Periodic Broadcast and Fundamental Resource Tradeoffs

Frame-Based Periodic Broadcast and Fundamental Resource Tradeoffs Frame-Based Periodic Broadcast and Fundamental Resource Tradeoffs Subhabrata Sen ½, Lixin Gao ¾, and Don Towsley ½ ½ Dept. of Computer Science ¾ Dept. of Computer Science University of Massachusetts Smith

More information

A Proposal for the Implementation of a Parallel Watershed Algorithm

A Proposal for the Implementation of a Parallel Watershed Algorithm A Proposal for the Implementation of a Parallel Watershed Algorithm A. Meijster and J.B.T.M. Roerdink University of Groningen, Institute for Mathematics and Computing Science P.O. Box 800, 9700 AV Groningen,

More information

Layer-Based Scheduling Algorithms for Multiprocessor-Tasks with Precedence Constraints

Layer-Based Scheduling Algorithms for Multiprocessor-Tasks with Precedence Constraints Layer-Based Scheduling Algorithms for Multiprocessor-Tasks with Precedence Constraints Jörg Dümmler, Raphael Kunis, and Gudula Rünger Chemnitz University of Technology, Department of Computer Science,

More information

Improved Algorithm for Minimum Flows in Bipartite Networks with Unit Capacities

Improved Algorithm for Minimum Flows in Bipartite Networks with Unit Capacities Improved Algorithm for Minimum Flows in Bipartite Networks with Unit Capacities ELEONOR CIUREA Transilvania University of Braşov Theoretical Computer Science Departement Braşov, Iuliu Maniu 50, cod 500091

More information

Fast optimal task graph scheduling by means of an optimized parallel A -Algorithm

Fast optimal task graph scheduling by means of an optimized parallel A -Algorithm Fast optimal task graph scheduling by means of an optimized parallel A -Algorithm Udo Hönig and Wolfram Schiffmann FernUniversität Hagen, Lehrgebiet Rechnerarchitektur, 58084 Hagen, Germany {Udo.Hoenig,

More information

Key Grids: A Protocol Family for Assigning Symmetric Keys

Key Grids: A Protocol Family for Assigning Symmetric Keys Key Grids: A Protocol Family for Assigning Symmetric Keys Amitanand S. Aiyer University of Texas at Austin anand@cs.utexas.edu Lorenzo Alvisi University of Texas at Austin lorenzo@cs.utexas.edu Mohamed

More information

Leveraging Set Relations in Exact Set Similarity Join

Leveraging Set Relations in Exact Set Similarity Join Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,

More information

A 2-Approximation Algorithm for the Soft-Capacitated Facility Location Problem

A 2-Approximation Algorithm for the Soft-Capacitated Facility Location Problem A 2-Approximation Algorithm for the Soft-Capacitated Facility Location Problem Mohammad Mahdian Yinyu Ye Ý Jiawei Zhang Þ Abstract This paper is divided into two parts. In the first part of this paper,

More information

A Duplication Based List Scheduling Genetic Algorithm for Scheduling Task on Parallel Processors

A Duplication Based List Scheduling Genetic Algorithm for Scheduling Task on Parallel Processors A Duplication Based List Scheduling Genetic Algorithm for Scheduling Task on Parallel Processors Dr. Gurvinder Singh Department of Computer Science & Engineering, Guru Nanak Dev University, Amritsar- 143001,

More information

APPROXIMATING A PARALLEL TASK SCHEDULE USING LONGEST PATH

APPROXIMATING A PARALLEL TASK SCHEDULE USING LONGEST PATH APPROXIMATING A PARALLEL TASK SCHEDULE USING LONGEST PATH Daniel Wespetal Computer Science Department University of Minnesota-Morris wesp0006@mrs.umn.edu Joel Nelson Computer Science Department University

More information

A Note on Karr s Algorithm

A Note on Karr s Algorithm A Note on Karr s Algorithm Markus Müller-Olm ½ and Helmut Seidl ¾ ½ FernUniversität Hagen, FB Informatik, LG PI 5, Universitätsstr. 1, 58097 Hagen, Germany mmo@ls5.informatik.uni-dortmund.de ¾ TU München,

More information