Efficient Algorithms for Scheduling and Mapping of Parallel Programs onto Parallel Architectures

Size: px
Start display at page:

Download "Efficient Algorithms for Scheduling and Mapping of Parallel Programs onto Parallel Architectures"

Transcription

1 Efficient Algorithms for Scheduling and Mapping of Parallel Programs onto Parallel Architectures By Yu-Kwong KWOK A Thesis Presented to The Hong Kong University of Science and Technology in Partial Fulfilment of the Requirements for the Degree of Master of Philosophy in Computer Science Hong Kong, June 1994 Copyright by Yu-Kwong KWOK 1994

2 Authorisation I hereby declare that I am the sole author of the thesis. I authorise the Hong Kong University of Science and Technology to lend this thesis to other institutions or individuals for the purpose of scholarly research. I further authorise the Hong Kong University of Science and Technology to reproduce the thesis by photocopying or by other means, in total or in part, at the request of other institutions or individuals for the purpose of scholarly research. - ii -

3 Efficient Algorithms for Scheduling and Mapping of Parallel Programs onto Parallel Architectures By Yu-Kwong KWOK APPROVED: Dr. Ishfaq AHMAD, Lecturer (Advisor) Dr. Jogesh K. MUPPALA, Lecturer Dr. Helen C. SHEN, Senior Lecturer Department of Computer Science June 10, iii -

4 Acknowledgements I would like to sincerely thank my advisor, Dr. Ishfaq Ahmad, for his patience, guidance and invaluable advice on my studies. I am very grateful for his continual support on both academic and personal problems. I am most grateful to my wife, Fyon, for her love and patience that keeps me working whenever I am frustrated. Without the encouragements from Dr. Ahmad and Fyon, it could have been difficult to continue my graduate studies. I am also very grateful to Dr. Jogesh K. Muppala and Dr. Helen C. Shen for their helpful reviews and suggestions on the thesis. Thanks are also extended to Dr. Siu-Wing Cheng and Dr. Michael Kaminski for their invaluable advice on my studies. I enjoyed working with Richard Li, Eric Yeung and Warren Tsui on the CASCH project and I thank them for their precious contributions. I would like to acknowledge the Hong Kong Research Grants Council for supporting this work (under contract number HKUST 179/93E). Finally, I thank the Computer Science Department for its generosity in providing a nice and convenient work environment for its graduate students. I hope that the department will continue to do so in the future. - iv -

5 Table of Contents Title Page... i Authorisation Page...ii Signature Page...iii Acknowledgements...iv Table of Contents... v List of Figures...viii List of Tables...xii Abstract...xiii Chapter 1 Introduction Overview Parallel Architectures and the Scheduling Problem A Taxonomy of Approaches to the Scheduling Problem Outline of the Thesis... 7 Chapter 2 Evolution of the Scheduling Problem Introduction Problem Statement Optimal Static Scheduling Algorithms Optimal Scheduling of Tree-structured Task Graphs Optimal Scheduling for Two-processor Systems Heuristic Approaches State-of-the-Art Scheduling Algorithms The EZ Algorithm The MCP Algorithm The MD Algorithm The DSC Algorithm v -

6 2.6 Scheduling Using Task Duplication The DSH Algorithm The BTDH Algorithm Scheduling and Mapping Algorithms The MH Algorithm The DLS Algorithm Summary Chapter 3 The Dynamic Critical Path Scheduling Algorithm Introduction The Proposed Algorithm Design Principles The DCP Algorithm An Application Example Workload Generation The Performance Comparison Comparison of Schedule Lengths A Global Comparison Number of Processors Comparison of Running Times Summary Chapter 4 Exploiting Task Duplication in Scheduling Introduction The Proposed Algorithms Algorithm for Unlimited Number of Processors Algorithm for Limited Number of Processors Algorithm for Heterogeneous Processors An Application Example The Performance Comparison Unlimited Homogeneous Processors Limited Homogeneous Processors Heterogeneous Processors Summary vi -

7 Chapter 5 The Bubble Scheduling and Allocation Algorithm Introduction The Proposed Algorithm Definitions and Notations Description of the Algorithm Characteristics of the Proposed Approach An Application Example The Performance Comparison Summary Chapter 6 Conclusions and Future Work References vii -

8 List of Figures Figure 1.1 : (a) A shared-memory architecture; (b) Message-passing (distributed memory) architectures... 4 Figure 1.2 : (a) A taxonomy of the approaches to the scheduling problem; (b) A task interaction graph; (c) A task precedence graph... 6 Figure 2.1 : (a) A simple tree-structured task graph with unit-cost tasks and without communication among tasks; (b) The optimal schedule of the task graph using three processors Figure 2.2 : (a) A simple task graph with unit-cost tasks and without communication among tasks; (b) The optimal schedule of the task graph in a two-processor system Figure 2.3 : (a) A task graph; (b) The schedule generated by the HLFET algorithm (schedule length = 43 time units); (c) The best possible schedule (schedule length = 34 time units) Figure 2.4 : The schedule of the task graph in Figure 2.3(a) generated by the EZ algorithm (schedule length = 35 time units) Figure 2.5 : (a) ASAP binding and (b) ALAP binding of the task graph in Figure 2.3(a).... Figure 2.6 : (a) A randomly generated task graph; (b) An initial schedule without duplication (schedule length = 299 time units); (c) The final schedule produced by the DSH algorithm (schedule length = 275 time units); (d) An intermediate schedule in which the duplication of n 2 increases the start times of n 3 and n 5 ; (e) The final schedule produced by the BTDH algorithm (schedule length = 246 time units) Figure 2.7 : (a) A simple task graph; (c) an intermediate schedule generated by MH - viii -

9 after node n 4 is scheduled; (c) another intermediate schedule generated by MH after node n 5 is scheduled; and (d) the final schedule produced by the MH algorithm Figure 3.1 : A parallel Gaussian elimination task graph Figure 3.2 : The schedule of the Gaussian elimination task graph generated by the EZ algorithm (schedule length = 600 time units) Figure 3.3 : The schedule of the Gaussian elimination task graph generated by the MCP algorithm and the DLS algorithm (schedule length = 5 time units). 43 Figure 3.4 : The schedule of the Gaussian elimination task graph generated by the DSC algorithm (schedule length = 460 time units) Figure 3.5 : The schedules of the Gaussian elimination task graph generated by the MD algorithm (schedule length = 460 time units) Figure 3.6 : The schedule of the Gaussian elimination task graph generated by the DCP algorithm (schedule length = 4 time units) Figure 3.7 : (a) An in-tree task graph; (b) An out-tree task graph; (c) A fork-join task graph; (d) A LU-decomposition task graph; (e) A mean value analysis task graph; (f) A Laplace equation solver task graph; (g) A FFT task graph Figure 3.8 : Average normalized schedule lengths (with respect to lower bounds) at various graph sizes for Gaussian elimination graph; algorithm ranking: DCP, (MD, MCP), DLS, DSC, EZ Figure 3.9 : Average normalized schedule lengths (with respect to lower bounds) at various graph sizes for Laplace equation graph; algorithm ranking: DCP, DLS, DSC, MCP, MD, EZ Figure 3.10: Average normalized schedule lengths (with respect to lower bounds) at various graph sizes for LU-Decomposition graph; algorithm ranking: DCP, MCP, DLS, MD, DSC, EZ Figure 3.11: Average normalized schedule lengths (with respect to lower bounds) at various graph sizes for fast Fourier transform graph; algorithm ranking: - ix -

10 DCP, MD, MCP, DLS, EZ, DSC Figure 3.12: Average normalized schedule lengths (with respect to lower bounds) at various graph sizes for mean value analysis graph; algorithm ranking: DCP, (DSC, MCP), (DLS, EZ), MD Figure 3.13: Average normalized schedule lengths (with respect to lower bounds) at various graph sizes for in-tree graphs; algorithm ranking: DCP, EZ, DSC, MD, MCP, DLS Figure 3.14: Average normalized schedule lengths (with respect to lower bounds) at various graph sizes for out-tree graphs; algorithm ranking: DCP, DLS, DSC, (MCP, EZ), MD Figure 3.15: Average normalized schedule lengths (with respect to lower bounds) at various graph sizes for completely random graphs; algorithm ranking: DCP, MD, (MCP, DLS), EZ, DSC Figure 3.16: Average normalized schedule lengths (with respect to lower bounds) at various graph sizes for fork-join graph; algorithm ranking: DCP, MD, MCP, DSC, DLS, EZ Figure 3.17: A Global comparison of the six algorithms in terms of better, worse and equal performance Figure 3.18: Average number of processors used by each algorithm Figure 3.19: Average running time for each algorithm Figure 4.1 : (a) A parallel Gaussian elimination task graph and (b) its optimal schedule without using duplication (schedule length = 3 time units) Figure 4.2 : The schedule generated by the DSH/HLFET algorithm and the BTDH/ HLFET algorithm (schedule length = 3 time units) Figure 4.3 : The schedule generated by the CPFD algorithm (schedule length = 0 time units) Figure 4.4 : The schedule generated by the ECPFD algorithm (schedule length = 310 time units) Figure 4.5 : The schedule generated by the HCPFD algorithm onto heterogeneous - x -

11 processors with variance factor = 0.5 (schedule length = 182 time units). 71 Figure 4.6 : Average number of processors used by the DSH, BTDH and CPFD algorithms Figure 4.7 : Efficiency of the BTDH, CFPD and ECPFD (100%, % and 50% processors) algorithms for various task graph sizes and CCRs Figure 4.8 : Percentage improvement in schedule length with HCPFD (VF = 0.2, 0.3, 0.5, 0.9) over CPFD at various task graph sizes and CCRs Figure 5.1 : A task graph representing the mean value analysis algorithm for a matrix of dimension 5x Figure 5.2 : The schedule generated by the (a) the MH algorithm (schedule length = 983, total communication cost incurred = 833), and (b) by the DLS algorithm (schedule length = 998, total communication cost incurred = 748) Figure 5.3 : The intermediate schedule generated by the BSA algorithm (a) after phase-1 (schedule length = 11, total communication cost incurred = 187) (b) after phase-2 (schedule length = 1187, total communication cost incurred = 3) Figure 5.4 : (a) The intermediate schedule generated by the BSA algorithm after phase-3 (schedule length = 1056, total communication cost incurred = 493) (b) The final schedule generated by the BSA algorithm (schedule length = 936, total communication cost incurred = 6) Figure 5.5 : Normalized scheduled lengths of regular graphs with varying dimensions on (a) 8-node ring, (b) 8-node hypercube, (c) 16-node ring, (d) 16-node hypercube, (e) 8-node random, and (f) 8-node fully connected topology Figure 5.6 : Normalized scheduled lengths of random graphs with varying number of nodes on (a) 8-node ring, (b) 8-node hypercube, (c) 16-node ring, (d) 16- node hypercube, (e) 8-node random, and (f) 8-node fully connected topology xi -

12 List of Tables Table 3.1 : Symbols and their meanings Table 4.1 : Notations and their meanings Table 4.2 Table 4.3 : A performance comparison of the DSH, BTDH and CPFD scheduling algorithms using unlimited number of processors : A comparison of the BTDH and ECPFD with limited number of processors Table 5.1 : Notations and their meanings Table 5.2 Table 5.3 Table 5.4 Table 5.5 Table 5.6 : A relative comparison of MH, DLS and BSA algorithms for a 500-node mean value analysis task graph on various topologies : A relative comparison of MH, DLS and BSA algorithms for a 500-node Gaussian elimination task graph on various topologies : A relative comparison of MH, DLS and BSA algorithms for a 500-node Laplace equation solver task graph on various topologies : A relative comparison of MH, DLS and BSA algorithms for a 500-node LU-decomposition task graph on various topologies : A relative comparison of MH, DLS and BSA algorithms for a 500-node random task graph on various topologies xii -

13 Efficient Algorithms for Scheduling and Mapping of Parallel Programs onto Parallel Architectures By Yu-Kwong KWOK Department of Computer Science Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong June 1994 Abstract Scheduling and mapping of computations onto processors is one of the crucial components of a parallel processing environment. Since the scheduling and mapping problems are known to be NP-complete in many variants, most of the previous solutions are based on heuristics. However, these approaches make simplifying assumptions about the parallel program and the target machine architecture, and, therefore, can be useful in very limited environments. In this thesis, we propose efficient algorithms for compile-time scheduling and mapping of parallel programs onto parallel processing systems, under more realistic assumptions. Our first algorithm, called the Dynamic Critical Path (DCP) algorithm, which is designed for scheduling arbitrary task graphs on unlimited and fully connected processors, is based on new principles. The DCP algorithm outperforms all of the contemporary scheduling algorithms known to us. The second algorithm, called the Critical Path Fast Duplication (CPFD) algorithm, is designed to exploit task duplication in scheduling. Using task duplication can drastically reduce the communication overhead. The CPFD algorithm outperforms two previous algorithms which are recently reported. We have developed a number of versions of this algorithm which can be used in limited or unlimited homogeneous as well as heterogeneous processors such as a cluster of workstations. The most attractive feature of these algorithms is that they can automatically adjust the degree of duplication depending upon the speed of the communication network and the number of processors available in the target system. The third algorithm, named the Bubble Scheduling and Allocation (BSA) algorithm, is a based on a novel technique which is different from the classical methods. It works by injecting all of the tasks of a parallel program to one processor in a serial fashion first. The tasks are then bubbled up to other processors depending upon the network topology. The BSA algorithm not only schedules tasks but also schedules communication edges on the communication channels. The algorithm can be used with any routing strategy and can optimize itself on any network topology. Our five proposed algorithms, as well as a number of algorithms proposed by others, have been implemented into an interactive software tool called CASCH (Computer-Aided SCHeduling). - xiii -

14 Chapter 1 Introduction 1.1 Overview During the past few years, we have witnessed a spectacular growth of parallel computing hardware platforms [23], []. This is because a variety of architectures have emerged exploiting advancements in processors technology, low overhead switches, fast communication channels, and rich interconnection network topologies. As the hardware of parallel processing systems evolves towards achieving the goal of a teraflop performance, the software designers of these systems face increasingly difficult challenges. These include designing new algorithms, programming models, languages, automated programming aids and performance assessment tools. Perhaps, the most crucial component of an efficient parallel processing software is the scheduling and allocation of the modules of a parallel program to the processors. This is because the modules of the parallel program must be properly arranged in time and space in order to optimize performance. Given a parallel program represented by a task graph, in which the nodes represent the tasks and the edges represent the communication costs and precedence constraints among the tasks, a scheduling algorithm determines the execution order of tasks and a mapping algorithm determines the allocation of these tasks to processors. The prime objective of scheduling and mapping is to minimize the execution time. This is equivalent to maximization of the speedup which is defined as the time required for sequential execution divided by the time required for parallel execution. It is well-known that the multiprocessor scheduling problem is NP-complete [26] in its many variants except for a few highly simplified cases [29], [36], [38], [59]. To tackle the - 1 -

15 problem, many heuristic algorithms, which are based on simplifying assumptions about the structure of the parallel programs as well as the underlying parallel processing systems, have been reported in the literature [1], [11], [16], [19], [33], [39], [64], [66], [67], [69]. For more realistic cases, a scheduling algorithm needs to address a number of issues. It should exploit the parallelism by identifying the task graph structure, and take into consideration task granularity, load balancing, arbitrary computation and communication costs, the number of processors and interprocessor communication. Moreover, in order to be of practical use, a scheduling algorithm should be fast and economical in terms of number of processors used. Addressing all of these issues makes the scheduling problem even more complex and challenging. In this thesis, we propose efficient and fast algorithms for compile-time scheduling and mapping of parallel programs onto scalable parallel processing systems, under realistic assumptions. Our first algorithm, called Dynamic Critical Path (DCP) algorithm, which is designed for a virtual architecture composing of unlimited fully-connected processors, is based on new principles. The proposed algorithm determines the critical path of the task graph at each scheduling step. It also schedules tasks to processors by rearranging the schedule dynamically in the sense that the tasks in the partial schedules remain mobile until the scheduling process finishes. The DCP algorithm outperforms all of the contemporary scheduling algorithms known to us. The second algorithm, called Critical Path Fast Duplication (CPFD) algorithm, is designed to exploit task duplication in scheduling. Using duplication can drastically reduce the communication overhead. The CPFD algorithm outperforms two previous algorithms which are recently reported. We have developed a number of versions of this algorithm which can be used in limited or unlimited homogeneous as well as heterogeneous processors such as a cluster of workstations. The most attractive feature of these algorithms is that they can automatically adjust the degree of duplication depending upon the speed of the communication network and the number of processor available in the target system. The third algorithm, named the Bubble Scheduling and Allocation (BSA) algorithm, is a based on a novel technique which is different from the classical methods. It works by injecting all of the tasks of a parallel program to one processor in a serial fashion first. The tasks are then bubbled up to other processors depending upon the network topology. The BSA algorithm not only schedules tasks but also schedules communication edges on the communication channels. The algorithm can be used with any routing strategy and can optimize itself on any network topology. All of our proposed - 2 -

16 algorithms are evaluated with a number of suites of task graphs. These include randomly generated graphs of various different structures, as well as tasks graphs for a number of parallel algorithms such as mean value analysis, Gaussian elimination, FFT, LUdecomposition, and Laplace equation solver. With the increasing advancement of hardware technology, high performance parallel processing machines are becoming more readily accessible. However, the lack of an efficient parallel programming and algorithm design tool is a major hurdle in using parallel machines to general applications. For example, without an efficient simulation tool, researchers usually find it very difficult to evaluate the performance of a scheduling algorithm. CASCH (Computer-Aided SCHeduling) is an interactive software tool for studying the performance of scheduling algorithms. We have implemented our proposed algorithms, as well as a number of algorithms developed by others, in the CASCH tool. With this versatile and flexible tool, a researcher or programmer can design and evaluate parallel program scheduling and mapping algorithms in a very efficient manner. The user can interactively draw task graphs which represent parallel programs, draw architecture graphs which represent the target parallel processing systems, and execute the various scheduling algorithms to observe and compare the performance of the parallel programs as well as the scheduling algorithms. 1.2 Parallel Architectures and the Scheduling Problem Parallel computers can be broadly classified into two categories: shared-memory (Figure 1.1(a)) and message-passing (distributed memory) (Figure 1.1(b)) architectures. Shared-memory machines (e.g., BBN Butterfly [22]) present a uniform-address-space view of the memory to the programmer; interprocessor communication is through writing and reading shared variables. The hardware generally provides an equal-cost access to any shared variable from any of the processors and there is no notion of communication locality. Message-passing architectures (e.g., hypercubes [32], [57]) use direct communication links between processors. Interprocessor communication and synchronization are achieved through explicit message passing. Each processing element (PE) is connected to a fixed number of PEs in some regular geometry such as a ring or a hypercube (see Figure 1.1(b)). The advantage of this approach over the shared-memory approach is the greater communication bandwidth in the system, due to the large number of simultaneous communications possible on the independent interprocessor links. Another advantage is - 3 -

17 Shared Memory Modules M 1 M2 Mk Shared Bus Processors PE 1 PE 2 PE n (a) PE0 PE1 PE0 PE2 PE1 PE3 PE3 PE2 PE4 PE6 PE5 PE7 A 4-processor ring. An 8-processor hypercube. (b) Figure 1.1: (a) A shared-memory architecture; (b) Message-passing (distributed memory) architectures. scalability. We can add new processors as well as communication channels to a messagepassing multicomputer with a very low cost. The disadvantage is the longer communication delay when the destination processor is not directly connected to the source processor. An efficient scheduling of a parallel program onto the processors is vital to achieving high performance in both shared-memory and message-passing architectures. The proposed scheduling algorithms can be applied to message-passing parallel architectures. These algorithms can be linked with parallel program compilers for performing static (deterministic) scheduling of macro data-flow graphs. These macro data-flow graphs can be generated for SPMD or MPMD style of parallel programs. These scheduling algorithms are executed off-line. We do not assume any preemption in our study

18 1.3 A Taxonomy of Approaches to the Scheduling Problem The scheduling problem is one of crucial importance in the effective utilization of large scale parallel computers and distributed computer networks. In a very broad sense, it is usual to subdivide the general scheduling problem into two categories job scheduling and scheduling and mapping (see Figure 1.2(a)). In the former class, independent jobs are to be scheduled among the processors of a distributed computing system to optimize overall system performance. In contrast, the scheduling and mapping problem requires the allocation of multiple interacting tasks of a single parallel program in order to minimize the completion time on the parallel computer system [10], [15], [17], [45], [55], [65]. While job scheduling requires dynamic run-time scheduling that is not a priori decidable, the scheduling and mapping problem can be addressed in both static [4], [7], [9], [18], [67], [39], [42], [43], [44], [50], [51], [52], [56], [58], [61], [62], [70] as well as dynamic contexts [2], [3], [37]. When the structure of the parallel program in terms of its task execution times, task dependencies, task communications and synchronization, is known a priori, scheduling can be accomplished statically at compile time. On the contrary, dynamic scheduling is required when a priori information is not available and scheduling is done on-the-fly according to the state of the system [2], [3]. Two distinctly different models of the parallel program have been considered extensively in the context of static scheduling the model the task interaction graph (TIG) and task precedence graph (TPG) model. They are shown in Figure 1.2(b) and Figure 1.2(c). With the task interaction graph model, graph vertices represent parallel processes and edges denote the inter-process interaction [47]. However, temporal execution dependencies are not explicitly represented. Thus, all tasks are considered essentially simultaneously and independently executable. For example, a TIG can be used to model the finite element method (FEM) [55]. The objective of mapping is the minimization of parallel program completion time [48]. This requires balancing the computation load uniformly among the processors while simultaneously keeping communication costs as low as possible. The mapping problem is analogous to graph-to-graph mapping since both the problem and machine models can be represented as graphs. The research in this area was pioneered by Stone and Bohkari [12], [13], [14]. Stone [63] applied network-flow algorithms to solve the assignment problem. Bokhari described the mapping problem as being equivalent to graph isomorphism, quadratic assignment and sparse matrix bandwidth reduction problems [11]. However, this approach does not consider the precedence constraints among the tasks and - 5 -

19 Parallel Program Scheduling Job Scheduling (independent tasks) Scheduling and Mapping (multiple interacting tasks) Dynamic Scheduling Static Scheduling Task Interaction Graph Task Precedence Graph (a) n 2 n 3 n 4 (b) (c) Figure 1.2: (a) A taxonomy of the approaches to the scheduling problem; (b) A task interaction graph; (c) A task precedence graph. can be useful for assigning clusters of tasks to the processors if tasks have already been scheduled into the clusters. Furthermore, since the temporal dependencies within the clusters are ignored, this approach cannot consider the sequencing of messages and contention on the communication channels. With the task precedence graph model, a parallel program is viewed as a directed acyclic graph, in which the nodes represent the tasks and the directed edges represent the execution dependencies as well as the amount of communication. Thus, in the task - 6 -

20 precedence graph shown in Figure 1.2(c), task n 4 cannot commence execution before tasks and n 2 finish execution and it gathers all the communication data from n 2 and n 3. The scheduling objective is again to properly schedule the tasks in time and space so as to minimize the program completion time or maximize the speedup, which is defined as the time required for sequential execution divided by the time required for parallel execution. For most parallel applications, a task precedence graph can model the program more accurately because it captures the temporal dependencies among tasks. This is the model we use in the scheduling problem addressed in this thesis. 1.4 Outline of the Thesis In Chapter 2, we present a discussion on the evolution of the static scheduling problem. We describe the earlier proposed classical scheduling algorithms as well as the current state of the art. We also briefly mention and discuss the merits and limitations of these algorithms. In Chapter 3, we present our proposed Dynamic Critical Path (DCP) scheduling algorithm. We first describe the design principles of our algorithm. Then, we present the DCP algorithm followed by an application example to illustrate the algorithm s effectiveness. Finally, the results and comparisons of the performance of our algorithm with other algorithms on a large set of task graphs with various types of structures are presented. In Chapter 4, we present our proposed Critical Path Fast Duplication (CPFD) scheduling algorithm. We first discuss the potential and benefit of using task duplication in scheduling. Afterwards, we discuss the design principles of the CPFD algorithm. We then describe the CPFD algorithm and its variants which are designed to tackle the cases of a limited number of homogeneous or heterogeneous processors. We also illustrate the functionality of the CPFD algorithm by presenting an application example. Finally, we present and discuss the experiments we conducted to investigate the performance of our algorithms compared with other duplication-based algorithms. In Chapter 5, we present our proposed Bubble Scheduling and Allocation (BSA) algorithm. We first discuss the issues of scheduling under realistic system constraints. We then describe the principles we used in designing an efficient and robust scheduling and mapping algorithm. Afterwards, we describe the BSA algorithm followed by an application example which illustrates the superior performance of the BSA algorithm over other algorithms. Finally, we present and compare the performance of the BSA algorithm and two other algorithms. Chapter 6 concludes the thesis. Future research directions are also suggested in the same chapter

21 Chapter 2 Evolution of the Scheduling Problem 2.1 Introduction In this chapter, we present a discussion on the evolution of the static scheduling problem by describing scheduling algorithms of the so-called different generations. First, we describe earlier reported scheduling algorithms which were designed based on simplifying assumptions on the task graphs as well as the underlying multiprocessor systems. Second, we describe the succeeding generation of scheduling algorithms, which cannot guarantee generating optimal solutions but can be applied to more realistic environment. These algorithms are also called scheduling heuristics. Essentially, these heuristics extend previous ideas to adopt more realistic constraints such as arbitrary computation costs and taking into consideration communication among tasks. Third, we describe four state-of-the-art scheduling algorithms, which are recently reported and are shown to be efficient compared with many other algorithms. We also describe two recently reported scheduling algorithms that are based on task duplication. Finally, we describe two contemporary scheduling algorithms that also consider scheduling of communication edges on communication links. The scheduling problem existed even before the advent of parallel computers since the allocation of a set of tasks to a single processor is also a non-trivial problem [29]. Static scheduling is an old problem and has been studied extensively in the operations research community for a long time. The scheduling problem in the context of parallel computers benefited from the approaches employed in the area of operations research since allocation of parallel programs to parallel processors is analogous to allocation of a set of jobs to a set of machines. However, such approaches assume a very simple model of the parallel computer - 8 -

22 [16], [19]. Over the years, with the rapid advancements in computer architectures, the scheduling problem has evolved through a number of generations. Every scheduling algorithm reported in the early literature works under different circumstances and assumptions. However, there are three fundamental questions to ask in scheduling: (1) does the algorithm make realistic assumptions? (2) is it sophisticated enough to capture the architectural details of the system? and (3) does the complexity of the algorithm permit it to be practically used for compile-time scheduling? The first question relates to the assumptions made by the scheduling algorithm about the parallel program and architecture models. As elaborated in later sections, earlier scheduling algorithms made simplifying assumptions such as equal computation times for all the tasks in the task graph, simple graph structure such as trees, or ignoring the communication delays among tasks altogether. Similarly, scheduling strategies that ignore precedence relations among tasks and the contention on communication links of the system may work only in certain environments. The second question is concerned with the optimization of the scheduling strategy with respect to the target architecture. The scheduling problem is not only problemdependent but is also machine-dependent. A scheduling algorithm tailored for one particular architecture may not generate efficient solutions on another architecture. Recently, a wide variety of architectures have emerged employing various design methodologies. The architectural attributes such as system topology, routing strategy, overlapped communication and computation, etc., if taken into account, can result in different allocation decisions. Therefore, more sophisticated algorithms are required that can optimize the allocation strategy for a given machine architecture. The third question which is related to the complexity of the heuristic is an important consideration. For example, consider the task graph for Gaussian elimination [70]. The task graph for Gaussian elimination on a 4 by 4 matrix consists of 18 nodes and 28 edges. The number of nodes, n, in the task graph is roughly O(N 2 ) for a matrix of dimension N. Thus, a scheduling algorithm whose complexity is O(n 2 ) will have a complexity of O(N 4 ). A number of reported scheduling algorithms exhibit good performance by considering only a set of small task graphs. Such algorithm do not carry enough potential to be used for practical purpose. Some low complexity algorithms, on the other end of the spectrum, do not always - 9 -

23 perform well. We would like to design efficient algorithms whose complexities are low enough to make them scalable for scheduling very large task graphs on a large number of processors. In the beginning, the architecture of the parallel machine and the parallel program were represented in a very abstract fashion. In order to tackle the problem, simplifying assumptions were made regarding the task graph structure representing the program and the model for the parallel processor systems [], [24], [25], [33]. However, the problem is NPcomplete even in two simple cases: (1) scheduling unit-time tasks to an arbitrary number of processors, (2) scheduling one or two time unit tasks to two processors [46]. There are only two special cases for which optimal polynomial time algorithms exist. These cases are: scheduling tree-structured task graphs with identical computation costs on arbitrary number of processors [36], and scheduling arbitrary task graphs with identical computation costs on two processors [19]. However, even in these cases, no communication is assumed among tasks of the parallel program []. There are many approaches that can be employed in static scheduling [27], [28]. These include queuing theory, graph-theoretic approaches, mathematical programming [34], [35] and state-space search [31], []. In the classical approach [1], [53], which is also called list scheduling, the basic idea is to make an ordered list of nodes by assigning them some priorities, and then repeatedly execute the following two steps until a valid schedule is obtained. 1) Select from the list the node with the highest priority for scheduling. 2) Select a processor to accommodate this node. The priorities are determined statically before the scheduling process begins. In the scheduling process, the node with the highest priority is chosen for scheduling. In the second step, the best possible processor, that is, the one which allows the earliest start time, is selected to accommodate this node. Most of the earlier reported scheduling algorithms are based on this concept employing variations in the priority assignment methods such as HLF (Highest level First), LP (Longest Path), LPT (Longest Processing Time) and CP (Critical Path) [], [29]. However, static priority assignment may not always precisely order the nodes for scheduling according to their relative importance. A node is more important than other

24 nodes if timely scheduling of that node can eventually lead to a better schedule. The drawback of the static approach is that an inefficient schedule may be generated if a relatively less important node is chosen for scheduling before the more important ones. Static priority assignment fails to capture the variation in relative importance of nodes during the scheduling process. In order to avoid scheduling less important nodes before the more important ones, node priorities need to be determined dynamically during the scheduling process. The priorities of nodes are re-computed after a node has been scheduled in order to capture the changes in the relative importance of nodes. Thus, the following three steps are repeatedly executed in such kind of scheduling algorithms. 1) Determine new priorities of all unscheduled nodes. 2) Select the node with the highest priority for scheduling. 3) Select the most suitable processor to accommodate this node. Scheduling algorithms which employ the above three-step approach can potentially generate better schedules [43], [61]. However, this can increase the complexity of the algorithm. 2.2 Problem Statement A parallel program is represented by a directed acyclic graph. A node in the parallel program graph represents a task which is a set of instructions that must be executed serially in the same processor. Associated with each node is its computation cost, denoted by w(n i ) which indicates the amount of computation required. The edges in the parallel program graph correspond to the communication messages and precedence constraints among the nodes. Associated with each edge is a number indicating the amount of communication data from one node to another. This number is called the communication cost of the edge and is denoted by c ij. Here, the subscript ij indicates that the directed edge emerges from the source node n i and incidents on the destination node n j. The source node and the destination node of an edge is called the parent node and the child node respectively. The communication-tocomputation-ratio (CCR) of a parallel program is defined as its average communication cost divided by its average computation cost on a given system. We assume each processor in the system possesses a dedicated hardware to deal with communication so that communication can take place simultaneously with computation. In a task graph, a node which does not have any parent is called an entry node while a node which does not have any child is called an exit node. A node cannot start execution before it gathers all of the messages from its

25 parent nodes. The communication cost among two nodes assigned to the same processor is assumed to be zero. The scheduling of nodes is non-preemptive a task scheduled to a processor cannot be interrupted before its completion. The scheduling problem is defined to be an allocation of a set of tasks to a set of processors such that the cumulative schedule length or makespan is minimized without violating the precedence constraints among the tasks. A schedule is considered as efficient if the schedule length is short and the number of processors used is reasonable. 2.3 Optimal Static Scheduling Algorithms There have been notably few known polynomial-time scheduling algorithms for determining minimum length schedules, even when severe constraints are imposed on the task graphs and the underlying parallel processing systems. Indeed, there are only two cases for which polynomial-time optimal scheduling algorithms are known: (1) when the task graph is a rooted tree, and (2) when there are only two processors available. In both cases, every task in the task graphs has unit computation cost. Furthermore, there is no communication assumed among the tasks. That is, w( n i ) = 1, c ij = 0 for all i and j. In the following, we describe the two algorithms for these two highly simplified cases Optimal Scheduling of Tree-structured Task Graphs Hu proposed a polynomial-time algorithm to determining minimum length schedules for tree-structured task graphs with unit-cost task and without communication among tasks [36]. The first step in Hu s algorithm involves the labelling of the nodes. A node n i is recursively given the label α i = X i + 1, where X i is the length of the longest path from n i to the exit node in the graph. Here, it should be noted that the rooted tree is assumed to be an in-tree. That is, each task in the graph has only one successor and there is only one exit node, which is the root of the tree. The labelling process begins with the exit node, which is given the label α 1 = 1. Nodes that are one edge above the exit node are given the label 2, and so on. It is clear that the minimum time to process the graph is related to, which is the T min node(s) with the highest numbered label, by the following inequality: T min α max α max

26 Using the above labelling procedure, an optimal schedule can be obtained for m processors by processing a tree-structured task graph in the following steps: (1) Schedule the first m (or fewer) nodes with the highest numbered label, i.e., the entry nodes, to the processors. If the number of entry nodes is greater than m, choose m nodes whose α i is greater than the others. In case of a tie, choose a node arbitrarily. (2) Remove the m scheduled nodes from the graph. Treat the nodes with no predecessor as the new entry nodes. (3) Repeat steps (1) and (2) until all nodes are scheduled. The labelling process of the algorithm partitions the task graph into a number of levels. In the scheduling process, each level of tasks are assigned to the available processors. Schedules generated using the above steps are optimal under the stated constraints. This is illustrated in the simple task graph and its optimal schedule shown in Figure 2.1. The complexity of the algorithm is linear in the number of nodes because each node in the task graph is visited a constant number of times. Labels PE 0 PE 1 PE 2 3 n 2 n 3 0 n 2 n 3 1 n 4 n 5 2 n 4 n 5 2 n n 6 (a) (b) Figure 2.1: (a) A simple tree-structured task graph with unit-cost tasks and without communication among tasks; (b) The optimal schedule of the task graph using three processors. Although Hu s algorithm can generate optimal schedules, it is not useful in practical parallel processing systems because it requires too severe constraints on the structure of the parallel programs

27 2.3.2 Optimal Scheduling for Two-processor Systems Optimal results for static scheduling have also been addressed by Coffman, Graham and Sethi [19], [59]. They developed an algorithm for generating optimal schedules for arbitrary task graphs with unit-cost tasks and without communication among tasks on twoprocessor systems. Their algorithm works on similar principles as Hu s algorithm. The algorithm first assigns labels to each node in the task graph. The assignment process proceeds up the graph in a way that considers as candidates for the assignment of the next label all nodes whose successors have already been assigned a label. After all nodes are assigned a label, a list is formed by ordering the tasks in decreasing label numbers, beginning with the last label assigned. The optimal schedule is then obtained by scheduling ready tasks in this list to idle processors. This is elaborated in the following steps. (1) Assign label 1 to one of the exit node. (2) Assume that labels 12,,, j 1 have been assigned. Let S be the set of unassigned nodes with no unlabeled successors. Select an element of S to be assigned label j as follows. For each node x in S, let y 1, y 2,, y k be the immediate successors of x. Then, define l( x) to be the decreasing sequence of integers formed by ordering the set of y s labels. Suppose that l( x) l( x' ) lexicographically for all x' in S. Assign the label j to x. (3) After all tasks have been labeled, use the list of tasks in descending order of labels for scheduling. Beginning from the first task in the list, schedule each task to one of the two given processors that allows the earlier execution of the task. Schedules generated using the above algorithm are optimal under the given constraints. An example is illustrated in Figure 2.2. Through the use of counter-examples, Coffman and Graham demonstrated that their algorithm can generate sub-optimal solutions when the number of processors is increased to three or more, or when the number of processors is two and tasks are allowed to have arbitrary computation costs. This is true even when the computation costs are allowed to be one or two units. The complexity of the algorithm is O( n 2 ), where n is the number of nodes in the task graph. Similar to Hu s algorithm, the above algorithm is not useful for modern parallel processing environments in which there may be many processing elements. 2.4 Heuristic Approaches Since the scheduling problem is shown to be computationally intractable even in very

28 PE 0 PE n 2 7 n 3 n 3 n 2 1 n 5 3 n 4 4 n 5 2 n 4 n 7 3 n n 6 2 n 7 (a) (b) Figure 2.2: (a) A simple task graph with unit-cost tasks and without communication among tasks; (b) The optimal schedule of the task graph in a two-processor system. simple cases [], researchers in this area resort to heuristic approaches from which reasonably good schedules can be obtained in a very efficient manner [60]. Adam, Chandy and Dickson [1] performed extensive simulations to compare the performance of several such earlier reported scheduling heuristics. These are the Highest Levels First with Estimated Times (HLFET), the Highest Levels First with No Estimated Times (HLFNET), the Random list scheduling, the Smallest Co-levels First with Estimated Times (SCFET) and the Smallest Co-levels First with No Estimated Times (SCFNET). All of these heuristics take task graphs of arbitrary structures with arbitrary task computation costs as input and schedule the graphs to an arbitrary number of processors. Each heuristic first constructs a list of nodes and schedules the nodes one after the other in a way similar to Hu s algorithm. However, it should be noted that no communication is assumed among the nodes in the task graph. Each heuristic assigns node priorities in a different manner. This is elaborated below. 1) In the HLFET algorithm, the term level refers to the sum of the computation costs of all nodes in the longest path from a node to an exit node. This level is used as the priority for the node. 2) In the HLFNET algorithm, all nodes are assumed to have equal computation costs. 3) In the Random list scheduling algorithm, nodes are assigned random priorities. 4) In the SCFET algorithm, a colevel of a node is calculated in the same way as its level, except that the length of the path is computed from the entry node rather than from the exit node. Node priorities are assigned according to colevels (i.e., the smaller the colevel, the higher the priority)

29 5) The SCFNET algorithm is the same as the SCFET algorithm except that all nodes are assumed to have equal computation cost. This is equivalent to an earliest precedence partition if computation costs are ignored. In their study, Adam, et al. defined a scheduling heuristic as being near optimal if the schedule lengths obtained by the heuristic is within five percent of the optimal schedule lengths in 90 percent of the cases. Using this criterion, extensive simulations based on real and randomly generated task graphs showed that the order of accuracy of the five algorithms is: HLFET, HLFNET, SCFNET, Random and SCFET. The near-optimal performance of HLFET indicates that longest-path (LP) scheduling can generate better schedules [8], [60]. This was also confirmed by Kohler [41]. This important attribute of a task graph is also used to design efficient scheduling algorithms which work under more realistic constraints. This is elaborated in the next section. 2.5 State-of-the-Art Scheduling Algorithms In this section, four contemporary scheduling algorithms and their characteristics are described. These are the Edge-zeroing (EZ) algorithm [56], the Modified Critical Path (MCP) algorithm [70], the Mobility Directed (MD) algorithm [70], the Dominant Sequence Clustering (DSC) algorithm [67]. These algorithms can be considered as the state-of-the-art because they are recently reported and are shown to be efficient for scheduling arbitrary task graphs. Furthermore, all of these algorithms consider arbitrary communication among tasks. Taking communication into consideration makes a scheduling algorithm become more sophisticated [6]. Before describing these algorithms, we discuss the reasons that motivated their development. Although the HLFET algorithm has near-optimal performance, it cannot generate efficient schedules for task graphs with communication among tasks. For example, consider the task graph shown in Figure 2.3(a). Here, a schedule is produced using the HLFET algorithm (communication costs are incorporated in calculating the levels of nodes). The schedule is shown in Figure 2.3(b) in which all the nodes are scheduled to one processor and the schedule length is 43 time units. The HLFET algorithm schedules nodes in the order:, n 2, n 3, n 4. However, the schedule length can be reduced by using one more processor. This can be seen from the schedule shown in Figure 2.3(c). This schedule, which is generated by hand, is produced according to the order:, n 3, n 2, n 4. At the second scheduling step, n 3 is a relatively more important node than n 2 because if it is not scheduled to start earlier on a

Exploiting Duplication to Minimize the Execution Times of Parallel Programs on Message-Passing Systems

Exploiting Duplication to Minimize the Execution Times of Parallel Programs on Message-Passing Systems Exploiting Duplication to Minimize the Execution Times of Parallel Programs on Message-Passing Systems Yu-Kwong Kwok and Ishfaq Ahmad Department of Computer Science Hong Kong University of Science and

More information

A Level-wise Priority Based Task Scheduling for Heterogeneous Systems

A Level-wise Priority Based Task Scheduling for Heterogeneous Systems International Journal of Information and Education Technology, Vol., No. 5, December A Level-wise Priority Based Task Scheduling for Heterogeneous Systems R. Eswari and S. Nickolas, Member IACSIT Abstract

More information

Benchmarking and Comparison of the Task Graph Scheduling Algorithms

Benchmarking and Comparison of the Task Graph Scheduling Algorithms Benchmarking and Comparison of the Task Graph Scheduling Algorithms Yu-Kwong Kwok 1 and Ishfaq Ahmad 2 1 Department of Electrical and Electronic Engineering The University of Hong Kong, Pokfulam Road,

More information

Multiprocessor Scheduling Using Task Duplication Based Scheduling Algorithms: A Review Paper

Multiprocessor Scheduling Using Task Duplication Based Scheduling Algorithms: A Review Paper Multiprocessor Scheduling Using Task Duplication Based Scheduling Algorithms: A Review Paper Ravneet Kaur 1, Ramneek Kaur 2 Department of Computer Science Guru Nanak Dev University, Amritsar, Punjab, 143001,

More information

LIST BASED SCHEDULING ALGORITHM FOR HETEROGENEOUS SYSYTEM

LIST BASED SCHEDULING ALGORITHM FOR HETEROGENEOUS SYSYTEM LIST BASED SCHEDULING ALGORITHM FOR HETEROGENEOUS SYSYTEM C. Subramanian 1, N.Rajkumar 2, S. Karthikeyan 3, Vinothkumar 4 1 Assoc.Professor, Department of Computer Applications, Dr. MGR Educational and

More information

A Comparison of Task-Duplication-Based Algorithms for Scheduling Parallel Programs to Message-Passing Systems

A Comparison of Task-Duplication-Based Algorithms for Scheduling Parallel Programs to Message-Passing Systems A Comparison of Task-Duplication-Based s for Scheduling Parallel Programs to Message-Passing Systems Ishfaq Ahmad and Yu-Kwong Kwok Department of Computer Science The Hong Kong University of Science and

More information

Contention-Aware Scheduling with Task Duplication

Contention-Aware Scheduling with Task Duplication Contention-Aware Scheduling with Task Duplication Oliver Sinnen, Andrea To, Manpreet Kaur Department of Electrical and Computer Engineering, University of Auckland Private Bag 92019, Auckland 1142, New

More information

Static Scheduling Algorithms for Allocating Directed Task Graphs to Multiprocessors

Static Scheduling Algorithms for Allocating Directed Task Graphs to Multiprocessors Static Scheduling Algorithms for Allocating Directed Task Graphs to Multiprocessors YU-KWONG KWOK The University of Hong Kong AND ISHFAQ AHMAD The Hong Kong University of Science and Technology Static

More information

AN ABSTRACT OF THE THESIS OF. Title: Static Task Scheduling and Grain Packing in Parallel. Theodore G. Lewis

AN ABSTRACT OF THE THESIS OF. Title: Static Task Scheduling and Grain Packing in Parallel. Theodore G. Lewis AN ABSTRACT OF THE THESIS OF Boontee Kruatrachue for the degree of Doctor of Philosophy in Electrical and Computer Engineering presented on June 10. 1987. Title: Static Task Scheduling and Grain Packing

More information

Scheduling tasks with precedence constraints on heterogeneous distributed computing systems

Scheduling tasks with precedence constraints on heterogeneous distributed computing systems University of Tennessee, Knoxville Trace: Tennessee Research and Creative Exchange Doctoral Dissertations Graduate School 12-2006 Scheduling tasks with precedence constraints on heterogeneous distributed

More information

An Experimental Investigation into the Rank Function of the Heterogeneous Earliest Finish Time Scheduling Algorithm

An Experimental Investigation into the Rank Function of the Heterogeneous Earliest Finish Time Scheduling Algorithm An Experimental Investigation into the Rank Function of the Heterogeneous Earliest Finish Time Scheduling Algorithm Henan Zhao and Rizos Sakellariou Department of Computer Science, University of Manchester,

More information

ISHFAQ AHMAD 1 AND YU-KWONG KWOK 2

ISHFAQ AHMAD 1 AND YU-KWONG KWOK 2 Optimal and Near-Optimal Allocation of Precedence-Constrained Tasks to Parallel Processors: Defying the High Complexity Using Effective Search Techniques Abstract Obtaining an optimal schedule for a set

More information

A STUDY OF BNP PARALLEL TASK SCHEDULING ALGORITHMS METRIC S FOR DISTRIBUTED DATABASE SYSTEM Manik Sharma 1, Dr. Gurdev Singh 2 and Harsimran Kaur 3

A STUDY OF BNP PARALLEL TASK SCHEDULING ALGORITHMS METRIC S FOR DISTRIBUTED DATABASE SYSTEM Manik Sharma 1, Dr. Gurdev Singh 2 and Harsimran Kaur 3 A STUDY OF BNP PARALLEL TASK SCHEDULING ALGORITHMS METRIC S FOR DISTRIBUTED DATABASE SYSTEM Manik Sharma 1, Dr. Gurdev Singh 2 and Harsimran Kaur 3 1 Assistant Professor & Head, Department of Computer

More information

On the Near-Optimality of List Scheduling Heuristics for Local and Global Instruction Scheduling

On the Near-Optimality of List Scheduling Heuristics for Local and Global Instruction Scheduling On the Near-Optimality of List Scheduling Heuristics for Local and Global Instruction Scheduling by John Michael Chase A thesis presented to the University of Waterloo in fulfillment of the thesis requirement

More information

A Novel Task Scheduling Algorithm for Heterogeneous Computing

A Novel Task Scheduling Algorithm for Heterogeneous Computing A Novel Task Scheduling Algorithm for Heterogeneous Computing Vinay Kumar C. P.Katti P. C. Saxena SC&SS SC&SS SC&SS Jawaharlal Nehru University Jawaharlal Nehru University Jawaharlal Nehru University New

More information

A Parallel Algorithm for Compile-Time Scheduling of Parallel Programs on Multiprocessors

A Parallel Algorithm for Compile-Time Scheduling of Parallel Programs on Multiprocessors A Parallel for Compile-Time Scheduling of Parallel Programs on Multiprocessors Yu-Kwong Kwok and Ishfaq Ahmad Email: {csricky, iahmad}@cs.ust.hk Department of Computer Science The Hong Kong University

More information

TASK SCHEDULING FOR PARALLEL SYSTEMS

TASK SCHEDULING FOR PARALLEL SYSTEMS TASK SCHEDULING FOR PARALLEL SYSTEMS Oliver Sinnen Department of Electrical and Computer Engineering The University of Aukland New Zealand TASK SCHEDULING FOR PARALLEL SYSTEMS TASK SCHEDULING FOR PARALLEL

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

6LPXODWLRQÃRIÃWKHÃ&RPPXQLFDWLRQÃ7LPHÃIRUÃDÃ6SDFH7LPH $GDSWLYHÃ3URFHVVLQJÃ$OJRULWKPÃRQÃDÃ3DUDOOHOÃ(PEHGGHG 6\VWHP

6LPXODWLRQÃRIÃWKHÃ&RPPXQLFDWLRQÃ7LPHÃIRUÃDÃ6SDFH7LPH $GDSWLYHÃ3URFHVVLQJÃ$OJRULWKPÃRQÃDÃ3DUDOOHOÃ(PEHGGHG 6\VWHP LPXODWLRQÃRIÃWKHÃ&RPPXQLFDWLRQÃLPHÃIRUÃDÃSDFHLPH $GDSWLYHÃURFHVVLQJÃ$OJRULWKPÃRQÃDÃDUDOOHOÃ(PEHGGHG \VWHP Jack M. West and John K. Antonio Department of Computer Science, P.O. Box, Texas Tech University,

More information

A Duplication Based List Scheduling Genetic Algorithm for Scheduling Task on Parallel Processors

A Duplication Based List Scheduling Genetic Algorithm for Scheduling Task on Parallel Processors A Duplication Based List Scheduling Genetic Algorithm for Scheduling Task on Parallel Processors Dr. Gurvinder Singh Department of Computer Science & Engineering, Guru Nanak Dev University, Amritsar- 143001,

More information

CASCH: A Software Tool for Automatic Parallelization and Scheduling of Programs on Message-Passing Multiprocessors

CASCH: A Software Tool for Automatic Parallelization and Scheduling of Programs on Message-Passing Multiprocessors CASCH: A Software Tool for Automatic Parallelization and Scheduling of Programs on Message-Passing Multiprocessors Abstract Ishfaq Ahmad 1, Yu-Kwong Kwok 2, Min-You Wu 3, and Wei Shu 3 1 Department of

More information

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides)

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Computing 2012 Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Algorithm Design Outline Computational Model Design Methodology Partitioning Communication

More information

Data Partitioning. Figure 1-31: Communication Topologies. Regular Partitions

Data Partitioning. Figure 1-31: Communication Topologies. Regular Partitions Data In single-program multiple-data (SPMD) parallel programs, global data is partitioned, with a portion of the data assigned to each processing node. Issues relevant to choosing a partitioning strategy

More information

Lecture 9: Load Balancing & Resource Allocation

Lecture 9: Load Balancing & Resource Allocation Lecture 9: Load Balancing & Resource Allocation Introduction Moler s law, Sullivan s theorem give upper bounds on the speed-up that can be achieved using multiple processors. But to get these need to efficiently

More information

Scheduling with Bus Access Optimization for Distributed Embedded Systems

Scheduling with Bus Access Optimization for Distributed Embedded Systems 472 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 8, NO. 5, OCTOBER 2000 Scheduling with Bus Access Optimization for Distributed Embedded Systems Petru Eles, Member, IEEE, Alex

More information

3 No-Wait Job Shops with Variable Processing Times

3 No-Wait Job Shops with Variable Processing Times 3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select

More information

Data Flow Graph Partitioning Schemes

Data Flow Graph Partitioning Schemes Data Flow Graph Partitioning Schemes Avanti Nadgir and Harshal Haridas Department of Computer Science and Engineering, The Pennsylvania State University, University Park, Pennsylvania 16802 Abstract: The

More information

APPROXIMATING A PARALLEL TASK SCHEDULE USING LONGEST PATH

APPROXIMATING A PARALLEL TASK SCHEDULE USING LONGEST PATH APPROXIMATING A PARALLEL TASK SCHEDULE USING LONGEST PATH Daniel Wespetal Computer Science Department University of Minnesota-Morris wesp0006@mrs.umn.edu Joel Nelson Computer Science Department University

More information

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11 Preface xvii Acknowledgments xix CHAPTER 1 Introduction to Parallel Computing 1 1.1 Motivating Parallelism 2 1.1.1 The Computational Power Argument from Transistors to FLOPS 2 1.1.2 The Memory/Disk Speed

More information

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 The Encoding Complexity of Network Coding Michael Langberg, Member, IEEE, Alexander Sprintson, Member, IEEE, and Jehoshua Bruck,

More information

Scheduling on clusters and grids

Scheduling on clusters and grids Some basics on scheduling theory Grégory Mounié, Yves Robert et Denis Trystram ID-IMAG 6 mars 2006 Some basics on scheduling theory 1 Some basics on scheduling theory Notations and Definitions List scheduling

More information

Critical Path Scheduling Parallel Programs on an Unbounded Number of Processors

Critical Path Scheduling Parallel Programs on an Unbounded Number of Processors Critical Path Scheduling Parallel Programs on an Unbounded Number of Processors Mourad Hakem, Franck Butelle To cite this version: Mourad Hakem, Franck Butelle. Critical Path Scheduling Parallel Programs

More information

FUTURE communication networks are expected to support

FUTURE communication networks are expected to support 1146 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL 13, NO 5, OCTOBER 2005 A Scalable Approach to the Partition of QoS Requirements in Unicast and Multicast Ariel Orda, Senior Member, IEEE, and Alexander Sprintson,

More information

ADAPTIVE VIDEO STREAMING FOR BANDWIDTH VARIATION WITH OPTIMUM QUALITY

ADAPTIVE VIDEO STREAMING FOR BANDWIDTH VARIATION WITH OPTIMUM QUALITY ADAPTIVE VIDEO STREAMING FOR BANDWIDTH VARIATION WITH OPTIMUM QUALITY Joseph Michael Wijayantha Medagama (08/8015) Thesis Submitted in Partial Fulfillment of the Requirements for the Degree Master of Science

More information

On the Complexity of List Scheduling Algorithms for Distributed-Memory Systems.

On the Complexity of List Scheduling Algorithms for Distributed-Memory Systems. On the Complexity of List Scheduling Algorithms for Distributed-Memory Systems. Andrei Rădulescu Arjan J.C. van Gemund Faculty of Information Technology and Systems Delft University of Technology P.O.Box

More information

Parallel Job Scheduling

Parallel Job Scheduling Parallel Job Scheduling Lectured by: Nguyễn Đức Thái Prepared by: Thoại Nam -1- Scheduling on UMA Multiprocessors Schedule: allocation of tasks to processors Dynamic scheduling A single queue of ready

More information

Controlled duplication for scheduling real-time precedence tasks on heterogeneous multiprocessors

Controlled duplication for scheduling real-time precedence tasks on heterogeneous multiprocessors Controlled duplication for scheduling real-time precedence tasks on heterogeneous multiprocessors Jagpreet Singh* and Nitin Auluck Department of Computer Science & Engineering Indian Institute of Technology,

More information

ACCELERATED COMPLEX EVENT PROCESSING WITH GRAPHICS PROCESSING UNITS

ACCELERATED COMPLEX EVENT PROCESSING WITH GRAPHICS PROCESSING UNITS ACCELERATED COMPLEX EVENT PROCESSING WITH GRAPHICS PROCESSING UNITS Prabodha Srimal Rodrigo Registration No. : 138230V Degree of Master of Science Department of Computer Science & Engineering University

More information

The Automatic Design of Batch Processing Systems

The Automatic Design of Batch Processing Systems The Automatic Design of Batch Processing Systems by Barry Dwyer, M.A., D.A.E., Grad.Dip. A thesis submitted for the degree of Doctor of Philosophy in the Department of Computer Science University of Adelaide

More information

Scheduling Algorithms in Large Scale Distributed Systems

Scheduling Algorithms in Large Scale Distributed Systems Scheduling Algorithms in Large Scale Distributed Systems Prof.dr.ing. Florin Pop University Politehnica of Bucharest, Faculty of Automatic Control and Computers (ACS-UPB) National Institute for Research

More information

Bi-Objective Optimization for Scheduling in Heterogeneous Computing Systems

Bi-Objective Optimization for Scheduling in Heterogeneous Computing Systems Bi-Objective Optimization for Scheduling in Heterogeneous Computing Systems Tony Maciejewski, Kyle Tarplee, Ryan Friese, and Howard Jay Siegel Department of Electrical and Computer Engineering Colorado

More information

A Fuzzy Logic Approach to Assembly Line Balancing

A Fuzzy Logic Approach to Assembly Line Balancing Mathware & Soft Computing 12 (2005), 57-74 A Fuzzy Logic Approach to Assembly Line Balancing D.J. Fonseca 1, C.L. Guest 1, M. Elam 1, and C.L. Karr 2 1 Department of Industrial Engineering 2 Department

More information

11. APPROXIMATION ALGORITHMS

11. APPROXIMATION ALGORITHMS 11. APPROXIMATION ALGORITHMS load balancing center selection pricing method: vertex cover LP rounding: vertex cover generalized load balancing knapsack problem Lecture slides by Kevin Wayne Copyright 2005

More information

A Modified Genetic Algorithm for Task Scheduling in Multiprocessor Systems

A Modified Genetic Algorithm for Task Scheduling in Multiprocessor Systems A Modified Genetic Algorithm for Task Scheduling in Multiprocessor Systems Yi-Hsuan Lee and Cheng Chen Department of Computer Science and Information Engineering National Chiao Tung University, Hsinchu,

More information

EFFICIENT ATTACKS ON HOMOPHONIC SUBSTITUTION CIPHERS

EFFICIENT ATTACKS ON HOMOPHONIC SUBSTITUTION CIPHERS EFFICIENT ATTACKS ON HOMOPHONIC SUBSTITUTION CIPHERS A Project Report Presented to The faculty of the Department of Computer Science San Jose State University In Partial Fulfillment of the Requirements

More information

Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS

Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS Structure Page Nos. 2.0 Introduction 4 2. Objectives 5 2.2 Metrics for Performance Evaluation 5 2.2. Running Time 2.2.2 Speed Up 2.2.3 Efficiency 2.3 Factors

More information

DEGENERACY AND THE FUNDAMENTAL THEOREM

DEGENERACY AND THE FUNDAMENTAL THEOREM DEGENERACY AND THE FUNDAMENTAL THEOREM The Standard Simplex Method in Matrix Notation: we start with the standard form of the linear program in matrix notation: (SLP) m n we assume (SLP) is feasible, and

More information

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT PhD Summary DOCTORATE OF PHILOSOPHY IN COMPUTER SCIENCE & ENGINEERING By Sandip Kumar Goyal (09-PhD-052) Under the Supervision

More information

Design of Parallel Algorithms. Models of Parallel Computation

Design of Parallel Algorithms. Models of Parallel Computation + Design of Parallel Algorithms Models of Parallel Computation + Chapter Overview: Algorithms and Concurrency n Introduction to Parallel Algorithms n Tasks and Decomposition n Processes and Mapping n Processes

More information

Scan Scheduling Specification and Analysis

Scan Scheduling Specification and Analysis Scan Scheduling Specification and Analysis Bruno Dutertre System Design Laboratory SRI International Menlo Park, CA 94025 May 24, 2000 This work was partially funded by DARPA/AFRL under BAE System subcontract

More information

Load Balancing and Termination Detection

Load Balancing and Termination Detection Chapter 7 Load Balancing and Termination Detection 1 Load balancing used to distribute computations fairly across processors in order to obtain the highest possible execution speed. Termination detection

More information

Parallel Algorithm Design. Parallel Algorithm Design p. 1

Parallel Algorithm Design. Parallel Algorithm Design p. 1 Parallel Algorithm Design Parallel Algorithm Design p. 1 Overview Chapter 3 from Michael J. Quinn, Parallel Programming in C with MPI and OpenMP Another resource: http://www.mcs.anl.gov/ itf/dbpp/text/node14.html

More information

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,

More information

Scheduling in Distributed Computing Systems Analysis, Design & Models

Scheduling in Distributed Computing Systems Analysis, Design & Models Scheduling in Distributed Computing Systems Analysis, Design & Models (A Research Monograph) Scheduling in Distributed Computing Systems Analysis, Design & Models (A Research Monograph) by Deo Prakash

More information

CSC630/CSC730 Parallel & Distributed Computing

CSC630/CSC730 Parallel & Distributed Computing CSC630/CSC730 Parallel & Distributed Computing Analytical Modeling of Parallel Programs Chapter 5 1 Contents Sources of Parallel Overhead Performance Metrics Granularity and Data Mapping Scalability 2

More information

Principle Of Parallel Algorithm Design (cont.) Alexandre David B2-206

Principle Of Parallel Algorithm Design (cont.) Alexandre David B2-206 Principle Of Parallel Algorithm Design (cont.) Alexandre David B2-206 1 Today Characteristics of Tasks and Interactions (3.3). Mapping Techniques for Load Balancing (3.4). Methods for Containing Interaction

More information

A Genetic Algorithm for Multiprocessor Task Scheduling

A Genetic Algorithm for Multiprocessor Task Scheduling A Genetic Algorithm for Multiprocessor Task Scheduling Tashniba Kaiser, Olawale Jegede, Ken Ferens, Douglas Buchanan Dept. of Electrical and Computer Engineering, University of Manitoba, Winnipeg, MB,

More information

Resource Allocation Strategies for Multiple Job Classes

Resource Allocation Strategies for Multiple Job Classes Resource Allocation Strategies for Multiple Job Classes by Ye Hu A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Mathematics in Computer

More information

Algorithms and Applications

Algorithms and Applications Algorithms and Applications 1 Areas done in textbook: Sorting Algorithms Numerical Algorithms Image Processing Searching and Optimization 2 Chapter 10 Sorting Algorithms - rearranging a list of numbers

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

On Universal Cycles of Labeled Graphs

On Universal Cycles of Labeled Graphs On Universal Cycles of Labeled Graphs Greg Brockman Harvard University Cambridge, MA 02138 United States brockman@hcs.harvard.edu Bill Kay University of South Carolina Columbia, SC 29208 United States

More information

A Connection between Network Coding and. Convolutional Codes

A Connection between Network Coding and. Convolutional Codes A Connection between Network Coding and 1 Convolutional Codes Christina Fragouli, Emina Soljanin christina.fragouli@epfl.ch, emina@lucent.com Abstract The min-cut, max-flow theorem states that a source

More information

Parallel Program Execution on a Heterogeneous PC Cluster Using Task Duplication

Parallel Program Execution on a Heterogeneous PC Cluster Using Task Duplication Parallel Program Execution on a Heterogeneous PC Cluster Using Task Duplication YU-KWONG KWOK Department of Electrical and Electronic Engineering The University of Hong Kong, Pokfulam Road, Hong Kong Email:

More information

A Parallel Algorithm for Exact Structure Learning of Bayesian Networks

A Parallel Algorithm for Exact Structure Learning of Bayesian Networks A Parallel Algorithm for Exact Structure Learning of Bayesian Networks Olga Nikolova, Jaroslaw Zola, and Srinivas Aluru Department of Computer Engineering Iowa State University Ames, IA 0010 {olia,zola,aluru}@iastate.edu

More information

SCHEDULING OF PRECEDENCE CONSTRAINED TASK GRAPHS ON MULTIPROCESSOR SYSTEMS

SCHEDULING OF PRECEDENCE CONSTRAINED TASK GRAPHS ON MULTIPROCESSOR SYSTEMS ISSN : 0973-7391 Vol. 3, No. 1, January-June 2012, pp. 233-240 SCHEDULING OF PRECEDENCE CONSTRAINED TASK GRAPHS ON MULTIPROCESSOR SYSTEMS Shailza Kamal 1, and Sukhwinder Sharma 2 1 Department of Computer

More information

Grid Scheduling Strategy using GA (GSSGA)

Grid Scheduling Strategy using GA (GSSGA) F Kurus Malai Selvi et al,int.j.computer Technology & Applications,Vol 3 (5), 8-86 ISSN:2229-693 Grid Scheduling Strategy using GA () Dr.D.I.George Amalarethinam Director-MCA & Associate Professor of Computer

More information

Course: Operating Systems Instructor: M Umair. M Umair

Course: Operating Systems Instructor: M Umair. M Umair Course: Operating Systems Instructor: M Umair Process The Process A process is a program in execution. A program is a passive entity, such as a file containing a list of instructions stored on disk (often

More information

Implementation of Dynamic Level Scheduling Algorithm using Genetic Operators

Implementation of Dynamic Level Scheduling Algorithm using Genetic Operators Implementation of Dynamic Level Scheduling Algorithm using Genetic Operators Prabhjot Kaur 1 and Amanpreet Kaur 2 1, 2 M. Tech Research Scholar Department of Computer Science and Engineering Guru Nanak

More information

Parallel Fast Fourier Transform implementations in Julia 12/15/2011

Parallel Fast Fourier Transform implementations in Julia 12/15/2011 Parallel Fast Fourier Transform implementations in Julia 1/15/011 Abstract This paper examines the parallel computation models of Julia through several different multiprocessor FFT implementations of 1D

More information

A SIMULATION OF POWER-AWARE SCHEDULING OF TASK GRAPHS TO MULTIPLE PROCESSORS

A SIMULATION OF POWER-AWARE SCHEDULING OF TASK GRAPHS TO MULTIPLE PROCESSORS A SIMULATION OF POWER-AWARE SCHEDULING OF TASK GRAPHS TO MULTIPLE PROCESSORS Xiaojun Qi, Carson Jones, and Scott Cannon Computer Science Department Utah State University, Logan, UT, USA 84322-4205 xqi@cc.usu.edu,

More information

Framework for Design of Dynamic Programming Algorithms

Framework for Design of Dynamic Programming Algorithms CSE 441T/541T Advanced Algorithms September 22, 2010 Framework for Design of Dynamic Programming Algorithms Dynamic programming algorithms for combinatorial optimization generalize the strategy we studied

More information

ARELAY network consists of a pair of source and destination

ARELAY network consists of a pair of source and destination 158 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 55, NO 1, JANUARY 2009 Parity Forwarding for Multiple-Relay Networks Peyman Razaghi, Student Member, IEEE, Wei Yu, Senior Member, IEEE Abstract This paper

More information

Multi-Way Number Partitioning

Multi-Way Number Partitioning Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence (IJCAI-09) Multi-Way Number Partitioning Richard E. Korf Computer Science Department University of California,

More information

The Pulled-Macro-Dataflow Model: An Execution Model for Multicore Shared-Memory Computers

The Pulled-Macro-Dataflow Model: An Execution Model for Multicore Shared-Memory Computers Brigham Young University BYU ScholarsArchive All Theses and Dissertations 2011-09-13 The Pulled-Macro-Dataflow Model: An Execution Model for Multicore Shared-Memory Computers Daniel Joseph Richins Brigham

More information

GEO BASED ROUTING FOR BORDER GATEWAY PROTOCOL IN ISP MULTI-HOMING ENVIRONMENT

GEO BASED ROUTING FOR BORDER GATEWAY PROTOCOL IN ISP MULTI-HOMING ENVIRONMENT GEO BASED ROUTING FOR BORDER GATEWAY PROTOCOL IN ISP MULTI-HOMING ENVIRONMENT Duleep Thilakarathne (118473A) Degree of Master of Science Department of Electronic and Telecommunication Engineering University

More information

Treewidth and graph minors

Treewidth and graph minors Treewidth and graph minors Lectures 9 and 10, December 29, 2011, January 5, 2012 We shall touch upon the theory of Graph Minors by Robertson and Seymour. This theory gives a very general condition under

More information

A Static Scheduling Heuristic for. Heterogeneous Processors. Hyunok Oh and Soonhoi Ha

A Static Scheduling Heuristic for. Heterogeneous Processors. Hyunok Oh and Soonhoi Ha 1 Static Scheduling Heuristic for Heterogeneous Processors Hyunok Oh and Soonhoi Ha The Department of omputer Engineering, Seoul National University, Seoul, 11-742, Korea: e-mail: foho,shag@comp.snu.ac.kr

More information

A Task Scheduling Method for Data Intensive Jobs in Multicore Distributed System

A Task Scheduling Method for Data Intensive Jobs in Multicore Distributed System 第一工業大学研究報告第 27 号 (2015)pp.13-17 13 A Task Scheduling Method for Data Intensive Jobs in Multicore Distributed System Kazuo Hajikano* 1 Hidehiro Kanemitsu* 2 Moo Wan Kim* 3 *1 Department of Information Technology

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms http://sudalab.is.s.u-tokyo.ac.jp/~reiji/pna16/ [ 4 ] Scheduling Theory Parallel Numerical Algorithms / IST / UTokyo 1 PNA16 Lecture Plan General Topics 1. Architecture and

More information

Integer Programming ISE 418. Lecture 7. Dr. Ted Ralphs

Integer Programming ISE 418. Lecture 7. Dr. Ted Ralphs Integer Programming ISE 418 Lecture 7 Dr. Ted Ralphs ISE 418 Lecture 7 1 Reading for This Lecture Nemhauser and Wolsey Sections II.3.1, II.3.6, II.4.1, II.4.2, II.5.4 Wolsey Chapter 7 CCZ Chapter 1 Constraint

More information

Scheduling Using Multi Objective Genetic Algorithm

Scheduling Using Multi Objective Genetic Algorithm IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 3, Ver. II (May Jun. 2015), PP 73-78 www.iosrjournals.org Scheduling Using Multi Objective Genetic

More information

High Level Synthesis

High Level Synthesis High Level Synthesis Design Representation Intermediate representation essential for efficient processing. Input HDL behavioral descriptions translated into some canonical intermediate representation.

More information

Lecture 3: Sorting 1

Lecture 3: Sorting 1 Lecture 3: Sorting 1 Sorting Arranging an unordered collection of elements into monotonically increasing (or decreasing) order. S = a sequence of n elements in arbitrary order After sorting:

More information

Galgotias University: (U.P. India) Department of Computer Science & Applications

Galgotias University: (U.P. India) Department of Computer Science & Applications The Society of Digital Information and Wireless Communications, (ISSN: -98) A Critical-Path and Top-Level attributes based Task Scheduling Algorithm for DAG (CPTL) Nidhi Rajak, Ranjit Rajak and Anurag

More information

Task Allocation for Minimizing Programs Completion Time in Multicomputer Systems

Task Allocation for Minimizing Programs Completion Time in Multicomputer Systems Task Allocation for Minimizing Programs Completion Time in Multicomputer Systems Gamal Attiya and Yskandar Hamam Groupe ESIEE Paris, Lab. A 2 SI Cité Descartes, BP 99, 93162 Noisy-Le-Grand, FRANCE {attiyag,hamamy}@esiee.fr

More information

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph.

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph. Trees 1 Introduction Trees are very special kind of (undirected) graphs. Formally speaking, a tree is a connected graph that is acyclic. 1 This definition has some drawbacks: given a graph it is not trivial

More information

CHAPTER 6 ORTHOGONAL PARTICLE SWARM OPTIMIZATION

CHAPTER 6 ORTHOGONAL PARTICLE SWARM OPTIMIZATION 131 CHAPTER 6 ORTHOGONAL PARTICLE SWARM OPTIMIZATION 6.1 INTRODUCTION The Orthogonal arrays are helpful in guiding the heuristic algorithms to obtain a good solution when applied to NP-hard problems. This

More information

Cost Models for Query Processing Strategies in the Active Data Repository

Cost Models for Query Processing Strategies in the Active Data Repository Cost Models for Query rocessing Strategies in the Active Data Repository Chialin Chang Institute for Advanced Computer Studies and Department of Computer Science University of Maryland, College ark 272

More information

Hashing. Hashing Procedures

Hashing. Hashing Procedures Hashing Hashing Procedures Let us denote the set of all possible key values (i.e., the universe of keys) used in a dictionary application by U. Suppose an application requires a dictionary in which elements

More information

A DAG-BASED ALGORITHM FOR DISTRIBUTED MUTUAL EXCLUSION ATHESIS MASTER OF SCIENCE

A DAG-BASED ALGORITHM FOR DISTRIBUTED MUTUAL EXCLUSION ATHESIS MASTER OF SCIENCE A DAG-BASED ALGORITHM FOR DISTRIBUTED MUTUAL EXCLUSION by Mitchell L. Neilsen ATHESIS submitted in partial fulfillment of the requirements for the degree MASTER OF SCIENCE Department of Computing and Information

More information

Mapping of Parallel Tasks to Multiprocessors with Duplication *

Mapping of Parallel Tasks to Multiprocessors with Duplication * Mapping of Parallel Tasks to Multiprocessors with Duplication * Gyung-Leen Park Dept. of Comp. Sc. and Eng. Univ. of Texas at Arlington Arlington, TX 76019-0015 gpark@cse.uta.edu Behrooz Shirazi Dept.

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms Given an NP-hard problem, what should be done? Theory says you're unlikely to find a poly-time algorithm. Must sacrifice one of three desired features. Solve problem to optimality.

More information

Efficient Non-domination Level Update Approach for Steady-State Evolutionary Multiobjective Optimization

Efficient Non-domination Level Update Approach for Steady-State Evolutionary Multiobjective Optimization Efficient Non-domination Level Update Approach for Steady-State Evolutionary Multiobjective Optimization Ke Li 1, Kalyanmoy Deb 1, Qingfu Zhang 2, and Sam Kwong 2 1 Department of Electrical and Computer

More information

ROUTING ALGORITHMS FOR RING NETWORKS

ROUTING ALGORITHMS FOR RING NETWORKS ROUTING ALGORITHMS FOR RING NETWORKS by Yong Wang B.Sc., Peking University, 1999 a thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in the School of Computing

More information

6. Lecture notes on matroid intersection

6. Lecture notes on matroid intersection Massachusetts Institute of Technology 18.453: Combinatorial Optimization Michel X. Goemans May 2, 2017 6. Lecture notes on matroid intersection One nice feature about matroids is that a simple greedy algorithm

More information

CS 580: Algorithm Design and Analysis. Jeremiah Blocki Purdue University Spring 2018

CS 580: Algorithm Design and Analysis. Jeremiah Blocki Purdue University Spring 2018 CS 580: Algorithm Design and Analysis Jeremiah Blocki Purdue University Spring 2018 Chapter 11 Approximation Algorithms Slides by Kevin Wayne. Copyright @ 2005 Pearson-Addison Wesley. All rights reserved.

More information

Performing MapReduce on Data Centers with Hierarchical Structures

Performing MapReduce on Data Centers with Hierarchical Structures INT J COMPUT COMMUN, ISSN 1841-9836 Vol.7 (212), No. 3 (September), pp. 432-449 Performing MapReduce on Data Centers with Hierarchical Structures Z. Ding, D. Guo, X. Chen, X. Luo Zeliu Ding, Deke Guo,

More information

Energy-Constrained Scheduling of DAGs on Multi-core Processors

Energy-Constrained Scheduling of DAGs on Multi-core Processors Energy-Constrained Scheduling of DAGs on Multi-core Processors Ishfaq Ahmad 1, Roman Arora 1, Derek White 1, Vangelis Metsis 1, and Rebecca Ingram 2 1 University of Texas at Arlington, Computer Science

More information

HARNESSING CERTAINTY TO SPEED TASK-ALLOCATION ALGORITHMS FOR MULTI-ROBOT SYSTEMS

HARNESSING CERTAINTY TO SPEED TASK-ALLOCATION ALGORITHMS FOR MULTI-ROBOT SYSTEMS HARNESSING CERTAINTY TO SPEED TASK-ALLOCATION ALGORITHMS FOR MULTI-ROBOT SYSTEMS An Undergraduate Research Scholars Thesis by DENISE IRVIN Submitted to the Undergraduate Research Scholars program at Texas

More information

Randomized algorithms have several advantages over deterministic ones. We discuss them here:

Randomized algorithms have several advantages over deterministic ones. We discuss them here: CS787: Advanced Algorithms Lecture 6: Randomized Algorithms In this lecture we introduce randomized algorithms. We will begin by motivating the use of randomized algorithms through a few examples. Then

More information