Provably Efficient Non-Preemptive Task Scheduling with Cilk
|
|
- Martha Hall
- 6 years ago
- Views:
Transcription
1 Provably Efficient Non-Preemptive Task Scheduling with Cilk V. -Y. Vee and W.-J. Hsu School of Applied Science, Nanyang Technological University Nanyang Avenue, Singapore Abstract We consider the problem of scheduling static task graphs by using Cilk, a C-based runtime system for multithreaded parallel programming. We assume no pre-emption of task execution and no prior knowledge of the task execution times. Given a task graph G, the output of the scheduling algorithm is a Cilk program P which, when executed, initiates the tasks in consistence with the precedence requirements of G. We show that the Cilk model has restrictions in implementing optimal schedules for certain types of task graphs; however, the restriction does not fundamentally hinder the practical applications of Cilk, as it is possible to produce reasonably good quality schedules (in the sense of expected execution time). Our algorithm identifies a minimal number of stages, assigns tasks to these stages, and bundles parallel tasks of the same stage into one Cilk procedure. By using Tarjan s algorithm (for set operations) to implement the bundling process, we demonstrate that the parallel schedule can be derived in O(n+e) time for all practical purposes, where n and e denote the number of nodes and edges in the task graph G. With P processors, the expected completion time for the scheduled tasks is bounded by Tp = O(T 1 /P+S), where T 1 denotes the total work, i.e., the time required for executing all tasks on a single processor, and S denotes the sum (over all stages) of the longest execution time of the tasks at each stage. When the execution times of tasks are relatively homogeneous, the quality of the schedule generated by using our approach is nearly optimal. Index Terms: Scheduling, Cilk. 1 Introduction Task scheduling is one of the most challenging problems in parallel and distributed computing [1]. The scheduling of an arbitrary task graph on arbitrary number of processors is NPcomplete [1,6]. We will introduce a sub-optimal approach to perform task scheduling based on Cilk, a provably efficient run-time system for multithreaded programming [2,4]. A set of tasks T will be specified as a strict (irreflexive) partially ordered set (T, <). The relation u < v denotes that the computation of task v depends on the results of the computation of task u, i.e., < specifies precedence constraints. If u < v, u is said to be a predecessor of v, and v a successor of u. The partial order < is conveniently represented as a directed acyclic graph called a task graph. A directed edge (i, j) from task T i to T j specifies that T i < T j. Two tasks T i and T j are comparable if either T i < T j or T j < T i holds (by transitivity); otherwise, they are incomparable. A feasible schedule must preserve all precedence relations. A schedule is efficient if it 1
2 minimizes the total execution time. Given a task graph, the classic task-scheduling problem is to find a feasible and efficient schedule for the assignment of the tasks to the processors. We present an efficient scheduling algorithm for implementing arbitrary task graphs by using Cilk, a C-based runtime system for multithreaded parallel programming [2,4]. Given a task graph G, the output of the scheduling algorithm is a Cilk program P which, when executed, initiates the tasks in a manner consistent with the precedence requirement of G. 1.2 The Cilk Model A Cilk multithreaded computation is viewed as a series-parallel dag that unfolds dynamically as the computation progresses. Threads are embedded in a tree of Cilk procedures. Each thread is a nonblocking C function. A thread can spawn other Cilk threads that begin new child procedures. The Cilk runtime system uses a provably efficient scheduler based on the concept of randomized work-stealing [5]. Because of this feature, the schedule of task graphs generated by our approach will also be nondeterministic. Nevertheless, the invoking sequence of the tasks is guaranteed to be consistent with the given task graph. The Cilk runtime system provides a performance model based on two parameters: work and critical path length. The term T 1 (called work) is the total time required executing the program on one processor. T is the time required for threads along the longest dependency path in the dag. It is shown [2] theoretically that Cilk s work-stealing scheduler executes a Cilk computation on P processors in time T p = O(T 1 /P+ T ), which gives an asymptotically optimal performance. It has also been empirically verified that the constant factor hidden by the order notation is a small one, i.e., it is a good approximation to the execution time on P processors [2]. Another crucial property of the Cilk model is the dag-consistent distributed shared memory model [3,7]. It is a lock-free consistency model suitable for multithreaded programming environment. The idea is that each thread sees values that are consistent with some serial execution order of the dag, but two different threads may see different serial orders. Thus, the writes performed by a thread are seen by its successors, but threads that are incomparable in the dag may or may not see each other s write. However, Theorem 1.1 shows that the dag-consistency model imposes a basic restriction in
3 implementing an optimal schedule for general task graphs. Theorem 1.1 There exist task graphs, for which an optimal schedule cannot be implemented by using Cilk procedures. Proof. (Omitted. [11]) Theorem 1.1 holds true because Cilk is designed for tree-like computations, instead of general computations. Given this, the best we can attempt is to implement a sub-optimal task scheduler with Cilk. 2. The Scheduling Algorithm Our algorithm consists of two phases. In the first phase, we apply bundling algorithm to transform a given task graph to a tree-structured bundle tree. Bundling algorithm is a dag-to-tree transformation. In the second phase, we map the bundle tree to the Cilk model of multithreaded computation. The main ideas of the algorithms will be explained while the details are referred to [11]. 2.1 Bundling Algorithm We will apply bundling algorithm to perform our dag-to-tree transformation. First partition the nodes of the task graph into subsets called bundles. All of the tasks that belong to the same bundle must be incomparable, such that they can be executed in parallel. Finally, if we treat a bundle as a node to construct a new graph G, and assign an edge from bundle B i to bundle B j if any task in B i is a predecessor of another tasks in B j, the graph G would be tree-structured. With the bundle tree, the tasks can then be scheduled in Cilk easily Bottom-up Bundling Assign each task in the task graph a stage such that the tasks in the same stage are incomparable to each other. The stage number for a given task node is defined to be one greater than the maximum stage numbers of all its successors, which can be determined by a breadth-first topological sort [10, 11]. In Bottom-up Bundling, consider the tasks in the last stage and proceed towards the earliest stage in a stage-by-stage fashion. As the bundling progresses, the bundles form several trees (referred to as partial bundle trees). Finally, they are linked together to form one final bundle tree. At each iteration, visit a task node t that has no successor or whose successors have all been visited. Create a new bundle Bn containing t (MAKESET (t) ). Then, check all the bundles that embed its immediate successors x ( FINDSET (x) ). This is to determine the relation of Bn and these bundles. For a bundle that embeds a successor x, find
4 the root bundle Br of the partial bundle tree that contains x (FINDROOT (x) ). If Bn Br and if Bn and Br belong to the same stage, we merge them together (UNION (Bn, Br) ); otherwise (Bn would be in the later stage), link them together and make Bn as the new root bundle. To perform the merging of bundles efficiently, we regard the bundles as disjoint sets and apply Tarjan s algorithm to handle the disjoint set operations: FINDSET, MAKESET and UNION [8,9]. Specifically, by regarding the so-called root bundle as a set, the same technique for maintaining disjoint sets can be applied. To determine the root bundle of any given bundle, we keep a direct reference (pointer) to the root bundle. Then apply the Path compression [9] technique to update this pointer after every root query (FINDROOT) operation. That the bundling produces a tree of bundles should be clear from the algorithm. Note that any task node would be queried by all its immediate predecessors. If any two immediate predecessors are in the same stage, they will be merged together. If they are not in the same stage, we make the one in the earlier stage as a predecessor of the one in the later stage, while eliminating the direct dependency of the former predecessor and the node being queried. This guarantees that every bundle has at most one parent, which in turn shows that a tree-structured bundle tree will be formed. It is also straightforward to verify (inductively) that the operational precedence specified by the bundle tree does not conflict with that of the input dag. Clearly, the tasks included in the root bundle are to be executed first. The child bundles are the successors for a given bundle. The directed edges reflect the operational precedence among bundles. The tree-to-dag conversion is, therefore, correct. 2.2 Generating Cilk procedures The key idea is, when a bundle becomes ready (i.e., when all predecessors of the tasks in the bundle are completed) we can spawn a Cilk procedure to execute the tasks included in the bundle. When all tasks have been executed, we spawn a Cilk procedure to handle each of its child bundle. 3 Performance Analysis Theorem 3.1 Let n denote the number of nodes and e the number of edges in a given task graph G. The time complexity for producing a parallel schedule is no more than O((n+e)(1+a(n, (n+e)))), where a(x, y) represents the inverse of the Ackermann s function.[9]. Proof. (Omitted. Cf. [11])
5 The following bound applies to the total execution time of the schedule. denotes the work, and S denotes the summation (over all stages) of the longest execution time of the tasks Lemma 3.2 For any task graph with work T 1 and at each stage. critical path length T and for any number P of processors, any greedy P-processor execution schedule achieves T p = T 1 /P + T. Proof. (Omitted. Cf. [11]) Notice that the bundling algorithm groups the tasks into several bundles, which will be spawned off and executed in a batch. Because of this, there can be no way to schedule one or more ready tasks in an unready bundle even if idle processors are available. A ready task will be scheduled if and only if the bundle embedding it becomes ready and there is idle processor. Clearly, if all the tasks in the same stage are already executed, all the tasks on the next lower stage level should become ready to be scheduled. The reason is simple: the predecessors of all the tasks in the next lower stage level should have been executed under this condition. In the extreme case where all the tasks in the same stage are grouped into the same bundle, the above statement still holds. This leads to the theorem below. Theorem 3.3 With P processors, the expected completion time for a set of task embedded in a Proof. (Omitted. Cf. [11]) From Lemma 3.2 and Theorem 3.3, the difference of the time bounds is (S - T ). For a given task graph, if all the tasks have identical execution time, we have (S - T ) = 0. This means that performance of the bundle tree is optimal if all tasks have identical task execution time. 4 Discussion and Conclusion We have presented a provably efficient approach for static task scheduling based on the Cilk model. It has been shown theoretically that the quality of the schedule generated by using our approach is nearly optimal in certain cases. We have used this approach to successfully implement a parallel Make facility on the IBM SP2 machine and obtained some preliminary results. Our work represents one of the serious attempts to use Cilk to handle nontrivial types of concurrency (as required in the task scheduling problem), and the positive results have not only proven that the obvious is wrong, but also encouraged us to take the new tool seriously in future research. bundle tree is bounded by Tp = O(T 1 /P+S), where T 1
6 Acknowledgment We thank Charles Leiserson for his generous provision of the Cilk system and the fun of doing research using the great system. References [1] Hesham El-Rewini, Theodore G. Lewis, Hesham H. Ali. "Task Scheduling in Parallel and Distributed Systems," Prentice Hall, [2] Robert D. Blumofe, Christopher F.Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. "Cilk: An efficient multithreaded runtime In Proceedings of the 35 th Annual Symposium on Foundations of Computer Science, pages , Santa Fe, New Mexico, November [6] Michael R. Garey, David S. Johnson. "Computers and Intractability: A Guide to the Theory of NP-Completeness," W. H. Freeman and Company, New York, [7] Robert D. Blumofe, Matteo Frigo, Christopher F. Joerg, Charles E. Leiserson, and Keith H. Randall. "An analysis of dag-consistent distributed shared-memory algorithms." In Proceedings of the Eighth Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA), Padua, Italy, June system." In Proceedings of the Fifth ACM SIGPLAN [8] Robert E. Tarjan, "On the efficiency of a good but not Symposium on Principles and Practice of Parallel Programming (PPoPP), pages , Santa Barbara, California, July [3] Robert D. Blumofe, Matteo Frigo, Christopher F. Joerg, Charles E. Leiserson, and Keith H. Randall. "Dagconsistent distributed shared memory." In Proceedings of the 10 th International Parallel Processing Symposium, Honolulu, Hawaii, April linear set merging algorithm," J. ACM 22:2, pages , [9] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, "Introductions to Algorithms," MIT Press, [10] Robert E. Tarjan, "Data Structures and Network Algorithms," SIAM, Philadelphia, [11] Vee Voon Yee, Provably Efficient Task Scheduling by Using Cilk, HYP, NTU-SAS [4] Robert D. Blumofe. "Executing Multithreaded Programs Efficiently." Ph.D. thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, September [5] Robert D. Blumofe and Charles E. Leiserson. "Scheduling multithreaded computations by work stealing."
A Minicourse on Dynamic Multithreaded Algorithms
Introduction to Algorithms December 5, 005 Massachusetts Institute of Technology 6.046J/18.410J Professors Erik D. Demaine and Charles E. Leiserson Handout 9 A Minicourse on Dynamic Multithreaded Algorithms
More information1 Optimizing parallel iterative graph computation
May 15, 2012 1 Optimizing parallel iterative graph computation I propose to develop a deterministic parallel framework for performing iterative computation on a graph which schedules work on vertices based
More informationMultithreaded Parallelism and Performance Measures
Multithreaded Parallelism and Performance Measures Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS 3101 (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101
More informationCost Model: Work, Span and Parallelism
CSE 539 01/15/2015 Cost Model: Work, Span and Parallelism Lecture 2 Scribe: Angelina Lee Outline of this lecture: 1. Overview of Cilk 2. The dag computation model 3. Performance measures 4. A simple greedy
More informationPlan. 1 Parallelism Complexity Measures. 2 cilk for Loops. 3 Scheduling Theory and Implementation. 4 Measuring Parallelism in Practice
lan Multithreaded arallelism and erformance Measures Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS 3101 1 2 cilk for Loops 3 4 Measuring arallelism in ractice 5 Announcements
More informationPlan. 1 Parallelism Complexity Measures. 2 cilk for Loops. 3 Scheduling Theory and Implementation. 4 Measuring Parallelism in Practice
lan Multithreaded arallelism and erformance Measures Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS 4435 - CS 9624 1 2 cilk for Loops 3 4 Measuring arallelism in ractice 5
More informationIntroduction to Algorithms Third Edition
Thomas H. Cormen Charles E. Leiserson Ronald L. Rivest Clifford Stein Introduction to Algorithms Third Edition The MIT Press Cambridge, Massachusetts London, England Preface xiü I Foundations Introduction
More informationAdvanced algorithms. topological ordering, minimum spanning tree, Union-Find problem. Jiří Vyskočil, Radek Mařík 2012
topological ordering, minimum spanning tree, Union-Find problem Jiří Vyskočil, Radek Mařík 2012 Subgraph subgraph A graph H is a subgraph of a graph G, if the following two inclusions are satisfied: 2
More informationHeterogeneous Multithreaded Computing
Heterogeneous Multithreaded Computing by Howard J. Lu Submitted to the Department of Electrical Engineering and Computer Science in artial Fulfillment of the Requirements for the Degrees of Bachelor of
More informationMultithreaded Programming in. Cilk LECTURE 1. Charles E. Leiserson
Multithreaded Programming in Cilk LECTURE 1 Charles E. Leiserson Supercomputing Technologies Research Group Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
More informationThomas H. Cormen Charles E. Leiserson Ronald L. Rivest. Introduction to Algorithms
Thomas H. Cormen Charles E. Leiserson Ronald L. Rivest Introduction to Algorithms Preface xiii 1 Introduction 1 1.1 Algorithms 1 1.2 Analyzing algorithms 6 1.3 Designing algorithms 1 1 1.4 Summary 1 6
More informationDistributed minimum spanning tree problem
Distributed minimum spanning tree problem Juho-Kustaa Kangas 24th November 2012 Abstract Given a connected weighted undirected graph, the minimum spanning tree problem asks for a spanning subtree with
More informationLecture 10: Strongly Connected Components, Biconnected Graphs
15-750: Graduate Algorithms February 8, 2016 Lecture 10: Strongly Connected Components, Biconnected Graphs Lecturer: David Witmer Scribe: Zhong Zhou 1 DFS Continued We have introduced Depth-First Search
More informationThe Geometry of Carpentry and Joinery
The Geometry of Carpentry and Joinery Pat Morin and Jason Morrison School of Computer Science, Carleton University, 115 Colonel By Drive Ottawa, Ontario, CANADA K1S 5B6 Abstract In this paper we propose
More informationCilk. Cilk In 2008, ACM SIGPLAN awarded Best influential paper of Decade. Cilk : Biggest principle
CS528 Slides are adopted from http://supertech.csail.mit.edu/cilk/ Charles E. Leiserson A Sahu Dept of CSE, IIT Guwahati HPC Flow Plan: Before MID Processor + Super scalar+ Vector Unit Serial C/C++ Coding
More informationPlan. 1 Parallelism Complexity Measures. 2 cilk for Loops. 3 Scheduling Theory and Implementation. 4 Measuring Parallelism in Practice
lan Multithreaded arallelism and erformance Measures Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS 02 - CS 9535 arallelism Complexity Measures 2 cilk for Loops 3 Measuring
More informationA GRAPH FROM THE VIEWPOINT OF ALGEBRAIC TOPOLOGY
A GRAPH FROM THE VIEWPOINT OF ALGEBRAIC TOPOLOGY KARL L. STRATOS Abstract. The conventional method of describing a graph as a pair (V, E), where V and E repectively denote the sets of vertices and edges,
More informationA Framework for Space and Time Efficient Scheduling of Parallelism
A Framework for Space and Time Efficient Scheduling of Parallelism Girija J. Narlikar Guy E. Blelloch December 996 CMU-CS-96-97 School of Computer Science Carnegie Mellon University Pittsburgh, PA 523
More informationOn Covering a Graph Optimally with Induced Subgraphs
On Covering a Graph Optimally with Induced Subgraphs Shripad Thite April 1, 006 Abstract We consider the problem of covering a graph with a given number of induced subgraphs so that the maximum number
More informationAdaptively Parallel Processor Allocation for Cilk Jobs
6.895 Theory of Parallel Systems Kunal Agrawal, Siddhartha Sen Final Report Adaptively Parallel Processor Allocation for Cilk Jobs Abstract An adaptively parallel job is one in which the number of processors
More informationAlgorithmic Aspects of Acyclic Edge Colorings
Algorithmic Aspects of Acyclic Edge Colorings Noga Alon Ayal Zaks Abstract A proper coloring of the edges of a graph G is called acyclic if there is no -colored cycle in G. The acyclic edge chromatic number
More informationLecture 22 Tuesday, April 10
CIS 160 - Spring 2018 (instructor Val Tannen) Lecture 22 Tuesday, April 10 GRAPH THEORY Directed Graphs Directed graphs (a.k.a. digraphs) are an important mathematical modeling tool in Computer Science,
More informationUnderstanding Task Scheduling Algorithms. Kenjiro Taura
Understanding Task Scheduling Algorithms Kenjiro Taura 1 / 48 Contents 1 Introduction 2 Work stealing scheduler 3 Analyzing execution time of work stealing 4 Analyzing cache misses of work stealing 5 Summary
More information6,8:15. MA/CSSE 473 Day 37. Student Questions. Kruskal data structures. Disjoint Set ADT. Complexity intro
6,8:15 MA/CSSE 473 Day 37 Student Questions Kruskal data structures Disjoint Set ADT Complexity intro Data Structures for Kruskal A sorted list of edges (edge list, not adjacency list) Edge e has fields
More informationComplexity Results on Graphs with Few Cliques
Discrete Mathematics and Theoretical Computer Science DMTCS vol. 9, 2007, 127 136 Complexity Results on Graphs with Few Cliques Bill Rosgen 1 and Lorna Stewart 2 1 Institute for Quantum Computing and School
More informationMulticore programming in CilkPlus
Multicore programming in CilkPlus Marc Moreno Maza University of Western Ontario, Canada CS3350 March 16, 2015 CilkPlus From Cilk to Cilk++ and Cilk Plus Cilk has been developed since 1994 at the MIT Laboratory
More informationGreedy Algorithms 1 {K(S) K(S) C} For large values of d, brute force search is not feasible because there are 2 d {1,..., d}.
Greedy Algorithms 1 Simple Knapsack Problem Greedy Algorithms form an important class of algorithmic techniques. We illustrate the idea by applying it to a simplified version of the Knapsack Problem. Informally,
More informationThe Union-Find Problem Is Linear
The Union-Find Problem Is Linear Hantao Zhang Computer Science Department The University of Iowa Iowa City, IA 52242 hzhang@cs.uiowa.edu Abstract The union-find problem, also known as the disjoint set
More informationToday: Amortized Analysis (examples) Multithreaded Algs.
Today: Amortized Analysis (examples) Multithreaded Algs. COSC 581, Algorithms March 11, 2014 Many of these slides are adapted from several online sources Reading Assignments Today s class: Chapter 17 (Amortized
More informationThe Cilk part is a small set of linguistic extensions to C/C++ to support fork-join parallelism. (The Plus part supports vector parallelism.
Cilk Plus The Cilk part is a small set of linguistic extensions to C/C++ to support fork-join parallelism. (The Plus part supports vector parallelism.) Developed originally by Cilk Arts, an MIT spinoff,
More informationarxiv:cs/ v1 [cs.ds] 20 Feb 2003
The Traveling Salesman Problem for Cubic Graphs David Eppstein School of Information & Computer Science University of California, Irvine Irvine, CA 92697-3425, USA eppstein@ics.uci.edu arxiv:cs/0302030v1
More informationA Primer on Scheduling Fork-Join Parallelism with Work Stealing
Doc. No.: N3872 Date: 2014-01-15 Reply to: Arch Robison A Primer on Scheduling Fork-Join Parallelism with Work Stealing This paper is a primer, not a proposal, on some issues related to implementing fork-join
More informationarxiv: v3 [cs.ds] 18 Apr 2011
A tight bound on the worst-case number of comparisons for Floyd s heap construction algorithm Ioannis K. Paparrizos School of Computer and Communication Sciences Ècole Polytechnique Fèdèrale de Lausanne
More informationBrushing the Locks out of the Fur: A Lock-Free Work Stealing Library Based on Wool
Brushing the Locks out of the Fur: A Lock-Free Work Stealing Library Based on Wool Håkan Sundell School of Business and Informatics University of Borås, 50 90 Borås E-mail: Hakan.Sundell@hb.se Philippas
More informationA Fast Algorithm for Optimal Alignment between Similar Ordered Trees
Fundamenta Informaticae 56 (2003) 105 120 105 IOS Press A Fast Algorithm for Optimal Alignment between Similar Ordered Trees Jesper Jansson Department of Computer Science Lund University, Box 118 SE-221
More informationOn Algebraic Expressions of Generalized Fibonacci Graphs
On Algebraic Expressions of Generalized Fibonacci Graphs MARK KORENBLIT and VADIM E LEVIT Department of Computer Science Holon Academic Institute of Technology 5 Golomb Str, PO Box 305, Holon 580 ISRAEL
More informationCilk, Matrix Multiplication, and Sorting
6.895 Theory of Parallel Systems Lecture 2 Lecturer: Charles Leiserson Cilk, Matrix Multiplication, and Sorting Lecture Summary 1. Parallel Processing With Cilk This section provides a brief introduction
More information3 No-Wait Job Shops with Variable Processing Times
3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select
More informationCSE 260 Lecture 19. Parallel Programming Languages
CSE 260 Lecture 19 Parallel Programming Languages Announcements Thursday s office hours are cancelled Office hours on Weds 2p to 4pm Jing will hold OH, too, see Moodle Scott B. Baden /CSE 260/ Winter 2014
More informationThe Implementation of Cilk-5 Multithreaded Language
The Implementation of Cilk-5 Multithreaded Language By Matteo Frigo, Charles E. Leiserson, and Keith H Randall Presented by Martin Skou 1/14 The authors Matteo Frigo Chief Scientist and founder of Cilk
More informationON WEIGHTED RECTANGLE PACKING WITH LARGE RESOURCES*
ON WEIGHTED RECTANGLE PACKING WITH LARGE RESOURCES* Aleksei V. Fishkin, 1 Olga Gerber, 1 Klaus Jansen 1 1 University of Kiel Olshausenstr. 40, 24118 Kiel, Germany {avf,oge,kj}@informatik.uni-kiel.de Abstract
More informationInterval Stabbing Problems in Small Integer Ranges
Interval Stabbing Problems in Small Integer Ranges Jens M. Schmidt Freie Universität Berlin, Germany Enhanced version of August 2, 2010 Abstract Given a set I of n intervals, a stabbing query consists
More informationBichromatic Line Segment Intersection Counting in O(n log n) Time
Bichromatic Line Segment Intersection Counting in O(n log n) Time Timothy M. Chan Bryan T. Wilkinson Abstract We give an algorithm for bichromatic line segment intersection counting that runs in O(n log
More informationThe Encoding Complexity of Network Coding
The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email mikel,spalex,bruck @caltech.edu Abstract In the multicast network
More informationRandomized Graph Algorithms
Randomized Graph Algorithms Vasileios-Orestis Papadigenopoulos School of Electrical and Computer Engineering - NTUA papadigenopoulos orestis@yahoocom July 22, 2014 Vasileios-Orestis Papadigenopoulos (NTUA)
More informationIntroduction to Multithreaded Algorithms
Introduction to Multithreaded Algorithms CCOM5050: Design and Analysis of Algorithms Chapter VII Selected Topics T. H. Cormen, C. E. Leiserson, R. L. Rivest, C. Stein. Introduction to algorithms, 3 rd
More informationTheorem 2.9: nearest addition algorithm
There are severe limits on our ability to compute near-optimal tours It is NP-complete to decide whether a given undirected =(,)has a Hamiltonian cycle An approximation algorithm for the TSP can be used
More information2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006
2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 The Encoding Complexity of Network Coding Michael Langberg, Member, IEEE, Alexander Sprintson, Member, IEEE, and Jehoshua Bruck,
More informationShared-memory Parallel Programming with Cilk Plus
Shared-memory Parallel Programming with Cilk Plus John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 4 30 August 2018 Outline for Today Threaded programming
More informationFoundations of Computer Science Spring Mathematical Preliminaries
Foundations of Computer Science Spring 2017 Equivalence Relation, Recursive Definition, and Mathematical Induction Mathematical Preliminaries Mohammad Ashiqur Rahman Department of Computer Science College
More informationCS521 \ Notes for the Final Exam
CS521 \ Notes for final exam 1 Ariel Stolerman Asymptotic Notations: CS521 \ Notes for the Final Exam Notation Definition Limit Big-O ( ) Small-o ( ) Big- ( ) Small- ( ) Big- ( ) Notes: ( ) ( ) ( ) ( )
More informationThe Data Locality of Work Stealing
The Data Locality of Work Stealing Umut A. Acar School of Computer Science Carnegie Mellon University umut@cs.cmu.edu Guy E. Blelloch School of Computer Science Carnegie Mellon University guyb@cs.cmu.edu
More informationTrees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph.
Trees 1 Introduction Trees are very special kind of (undirected) graphs. Formally speaking, a tree is a connected graph that is acyclic. 1 This definition has some drawbacks: given a graph it is not trivial
More informationOptimal Parallel Randomized Renaming
Optimal Parallel Randomized Renaming Martin Farach S. Muthukrishnan September 11, 1995 Abstract We consider the Renaming Problem, a basic processing step in string algorithms, for which we give a simultaneously
More informationLinearizable Iterators
Linearizable Iterators Supervised by Maurice Herlihy Abstract Petrank et. al. [5] provide a construction of lock-free, linearizable iterators for lock-free linked lists. We consider the problem of extending
More informationShared-memory Parallel Programming with Cilk Plus
Shared-memory Parallel Programming with Cilk Plus John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 4 19 January 2017 Outline for Today Threaded programming
More informationCilk: An Efficient Multithreaded Runtime System
Cilk: An Efficient Multithreaded Runtime System ROBERT D. BLUMOFE, CHRISTOPHER F. JOERG, BRADLEY C. KUSZMAUL, CHARLES E. LEISERSON, KEITH H. RANDALL, AND YULI ZHOU MIT Laboratory for Computer Science,
More informationUnion/Find Aka: Disjoint-set forest. Problem definition. Naïve attempts CS 445
CS 5 Union/Find Aka: Disjoint-set forest Alon Efrat Problem definition Given: A set of atoms S={1, n E.g. each represents a commercial name of a drugs. This set consists of different disjoint subsets.
More informationFUTURE communication networks are expected to support
1146 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL 13, NO 5, OCTOBER 2005 A Scalable Approach to the Partition of QoS Requirements in Unicast and Multicast Ariel Orda, Senior Member, IEEE, and Alexander Sprintson,
More informationParallel Algorithms: The Minimum Spanning Tree And Minimum Steiner Tree Problems
Parallel Algorithms: The Minimum Spanning Tree And Minimum Steiner Tree Problems Katie Zrncic COMP 512 Spring 2005 Introduction Parallel computing is one of the most exciting technologies to achieve prominence
More informationDistributed Data Structures and Algorithms for Disjoint Sets in Computing Connected Components of Huge Network
Distributed Data Structures and Algorithms for Disjoint Sets in Computing Connected Components of Huge Network Wing Ning Li, CSCE Dept. University of Arkansas, Fayetteville, AR 72701 wingning@uark.edu
More informationPLANAR GRAPH BIPARTIZATION IN LINEAR TIME
PLANAR GRAPH BIPARTIZATION IN LINEAR TIME SAMUEL FIORINI, NADIA HARDY, BRUCE REED, AND ADRIAN VETTA Abstract. For each constant k, we present a linear time algorithm that, given a planar graph G, either
More informationMulti-Way Number Partitioning
Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence (IJCAI-09) Multi-Way Number Partitioning Richard E. Korf Computer Science Department University of California,
More informationSpeculative Parallelism in Cilk++
Speculative Parallelism in Cilk++ Ruben Perez MIT rmperez@mit.edu Gregory Malecha Harvard University SEAS gmalecha@cs.harvard.edu ABSTRACT Backtracking search algorithms are useful in many domains, from
More informationCS 6783 (Applied Algorithms) Lecture 5
CS 6783 (Applied Algorithms) Lecture 5 Antonina Kolokolova January 19, 2012 1 Minimum Spanning Trees An undirected graph G is a pair (V, E); V is a set (of vertices or nodes); E is a set of (undirected)
More informationAnnouncements. CSEP 521 Applied Algorithms. Announcements. Polynomial time efficiency. Definitions of efficiency 1/14/2013
Announcements CSEP 51 Applied Algorithms Richard Anderson Winter 013 Lecture Reading Chapter.1,. Chapter 3 Chapter Homework Guidelines Prove that your algorithm works A proof is a convincing argument Give
More informationTheory of Computing Systems 2002 Springer-Verlag New York Inc.
Theory Comput. Systems 35, 321 347 (2002) DOI: 10.1007/s00224-002-1057-3 Theory of Computing Systems 2002 Springer-Verlag New York Inc. The Data Locality of Work Stealing Umut A. Acar, 1 Guy E. Blelloch,
More informationReachability in K 3,3 -free and K 5 -free Graphs is in Unambiguous Logspace
CHICAGO JOURNAL OF THEORETICAL COMPUTER SCIENCE 2014, Article 2, pages 1 29 http://cjtcs.cs.uchicago.edu/ Reachability in K 3,3 -free and K 5 -free Graphs is in Unambiguous Logspace Thomas Thierauf Fabian
More informationPreemptive Scheduling of Equal-Length Jobs in Polynomial Time
Preemptive Scheduling of Equal-Length Jobs in Polynomial Time George B. Mertzios and Walter Unger Abstract. We study the preemptive scheduling problem of a set of n jobs with release times and equal processing
More informationAn Optimal Algorithm for the Euclidean Bottleneck Full Steiner Tree Problem
An Optimal Algorithm for the Euclidean Bottleneck Full Steiner Tree Problem Ahmad Biniaz Anil Maheshwari Michiel Smid September 30, 2013 Abstract Let P and S be two disjoint sets of n and m points in the
More informationCS 267 Applications of Parallel Computers. Lecture 23: Load Balancing and Scheduling. James Demmel
CS 267 Applications of Parallel Computers Lecture 23: Load Balancing and Scheduling James Demmel http://www.cs.berkeley.edu/~demmel/cs267_spr99 CS267 L23 Load Balancing and Scheduling.1 Demmel Sp 1999
More informationScalable GPU Graph Traversal!
Scalable GPU Graph Traversal Duane Merrill, Michael Garland, and Andrew Grimshaw PPoPP '12 Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming Benwen Zhang
More informationLayer-Based Scheduling Algorithms for Multiprocessor-Tasks with Precedence Constraints
Layer-Based Scheduling Algorithms for Multiprocessor-Tasks with Precedence Constraints Jörg Dümmler, Raphael Kunis, and Gudula Rünger Chemnitz University of Technology, Department of Computer Science,
More informationThe Encoding Complexity of Network Coding
The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network
More informationSCHEDULING WITH RELEASE TIMES AND DEADLINES ON A MINIMUM NUMBER OF MACHINES *
SCHEDULING WITH RELEASE TIMES AND DEADLINES ON A MINIMUM NUMBER OF MACHINES * Mark Cieliebak 1, Thomas Erlebach 2, Fabian Hennecke 1, Birgitta Weber 1, and Peter Widmayer 1 1 Institute of Theoretical Computer
More informationThe minimum spanning tree and duality in graphs
The minimum spanning tree and duality in graphs Wim Pijls Econometric Institute Report EI 2013-14 April 19, 2013 Abstract Several algorithms for the minimum spanning tree are known. The Blue-red algorithm
More informationT. Biedl and B. Genc. 1 Introduction
Complexity of Octagonal and Rectangular Cartograms T. Biedl and B. Genc 1 Introduction A cartogram is a type of map used to visualize data. In a map regions are displayed in their true shapes and with
More informationA SIMPLE APPROXIMATION ALGORITHM FOR NONOVERLAPPING LOCAL ALIGNMENTS (WEIGHTED INDEPENDENT SETS OF AXIS PARALLEL RECTANGLES)
Chapter 1 A SIMPLE APPROXIMATION ALGORITHM FOR NONOVERLAPPING LOCAL ALIGNMENTS (WEIGHTED INDEPENDENT SETS OF AXIS PARALLEL RECTANGLES) Piotr Berman Department of Computer Science & Engineering Pennsylvania
More informationAn O(n 2.75 ) algorithm for online topological ordering
An O(n 2.75 ) algorithm for online topological ordering Deepak Ajwani, Tobias Friedrich, and Ulrich Meyer Max-Planck-Institut für Informatik, Saarbrücken, Germany Abstract. We present a simple algorithm
More informationSpace vs Time, Cache vs Main Memory
Space vs Time, Cache vs Main Memory Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS 4435 - CS 9624 (Moreno Maza) Space vs Time, Cache vs Main Memory CS 4435 - CS 9624 1 / 49
More informationAlgorithms and Theory of Computation. Lecture 5: Minimum Spanning Tree
Algorithms and Theory of Computation Lecture 5: Minimum Spanning Tree Xiaohui Bei MAS 714 August 31, 2017 Nanyang Technological University MAS 714 August 31, 2017 1 / 30 Minimum Spanning Trees (MST) A
More informationA Simpler Proof Of The Average Case Complexity Of Union-Find. With Path Compression
LBNL-57527 A Simpler Proof Of The Average Case Complexity Of Union-Find With Path Compression Kesheng Wu and Ekow Otoo Lawrence Berkeley National Laboratory, Berkeley, CA, USA {KWu, EJOtoo}@lbl.gov April
More informationTwo-Stage Fault-Tolerant k-ary Tree Multiprocessors
Two-Stage Fault-Tolerant k-ary Tree Multiprocessors Baback A. Izadi Department of Electrical and Computer Engineering State University of New York 75 South Manheim Blvd. New Paltz, NY 1561 U.S.A. bai@engr.newpaltz.edu
More informationCache-Oblivious Traversals of an Array s Pairs
Cache-Oblivious Traversals of an Array s Pairs Tobias Johnson May 7, 2007 Abstract Cache-obliviousness is a concept first introduced by Frigo et al. in [1]. We follow their model and develop a cache-oblivious
More informationSAT-CNF Is N P-complete
SAT-CNF Is N P-complete Rod Howell Kansas State University November 9, 2000 The purpose of this paper is to give a detailed presentation of an N P- completeness proof using the definition of N P given
More informationClustering Using Graph Connectivity
Clustering Using Graph Connectivity Patrick Williams June 3, 010 1 Introduction It is often desirable to group elements of a set into disjoint subsets, based on the similarity between the elements in the
More informationNested Parallelism in Transactional Memory
Nested Parallelism in Transactional Memory Kunal Agrawal Jeremy T. Fineman Jim Sukha Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge, MA 02139, USA
More informationLatency-Hiding Work Stealing
Latency-Hiding Work Stealing Stefan K. Muller April 2017 CMU-CS-16-112R Umut A. Acar School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 A version of this work appears in the proceedings
More informationA Simplified Correctness Proof for a Well-Known Algorithm Computing Strongly Connected Components
A Simplified Correctness Proof for a Well-Known Algorithm Computing Strongly Connected Components Ingo Wegener FB Informatik, LS2, Univ. Dortmund, 44221 Dortmund, Germany wegener@ls2.cs.uni-dortmund.de
More informationA note on Baker s algorithm
A note on Baker s algorithm Iyad A. Kanj, Ljubomir Perković School of CTI, DePaul University, 243 S. Wabash Avenue, Chicago, IL 60604-2301. Abstract We present a corrected version of Baker s algorithm
More informationAlgorithms and Theory of Computation. Lecture 5: Minimum Spanning Tree
Algorithms and Theory of Computation Lecture 5: Minimum Spanning Tree Xiaohui Bei MAS 714 August 31, 2017 Nanyang Technological University MAS 714 August 31, 2017 1 / 30 Minimum Spanning Trees (MST) A
More informationGraph Algorithms. Chapter 22. CPTR 430 Algorithms Graph Algorithms 1
Graph Algorithms Chapter 22 CPTR 430 Algorithms Graph Algorithms Why Study Graph Algorithms? Mathematical graphs seem to be relatively specialized and abstract Why spend so much time and effort on algorithms
More informationMultithreaded Programming in Cilk. Matteo Frigo
Multithreaded Programming in Cilk Matteo Frigo Multicore challanges Development time: Will you get your product out in time? Where will you find enough parallel-programming talent? Will you be forced to
More informationMulti-core Computing Lecture 1
Hi-Spade Multi-core Computing Lecture 1 MADALGO Summer School 2012 Algorithms for Modern Parallel and Distributed Models Phillip B. Gibbons Intel Labs Pittsburgh August 20, 2012 Lecture 1 Outline Multi-cores:
More informationSilkRoad: A Multithreaded Runtime System with Software Distributed Shared Memory for SMP Clusters
SilkRoad: A Multithreaded Runtime System with Software Distributed Shared Memory for SMP Clusters L. Peng, W.F. Wong, M.D. Feng and C.K. Yuen Department of Computer Science National University of Singapore
More informationMaximum flows & Maximum Matchings
Chapter 9 Maximum flows & Maximum Matchings This chapter analyzes flows and matchings. We will define flows and maximum flows and present an algorithm that solves the maximum flow problem. Then matchings
More informationA Distribution-Sensitive Dictionary with Low Space Overhead
A Distribution-Sensitive Dictionary with Low Space Overhead Prosenjit Bose, John Howat, and Pat Morin School of Computer Science, Carleton University 1125 Colonel By Dr., Ottawa, Ontario, CANADA, K1S 5B6
More informationCilk: An Efficient Multithreaded Runtime System
Cilk: An Efficient Multithreaded Runtime System ROBERT D. BLUMOFE, CHRISTOPHER F. JOERG, BRADLEY C. KUSZMAUL, CHARLES E. LEISERSON, KEITH H. RANDALL, AND YULI ZHOU MIT Laboratory for Computer Science,
More informationDynamic Processor Allocation for Adaptively Parallel Work-Stealing Jobs. Siddhartha Sen
Dynamic Processor Allocation for Adaptively Parallel Work-Stealing Jobs by Siddhartha Sen Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements
More informationComplexity Analysis of Routing Algorithms in Computer Networks
Complexity Analysis of Routing Algorithms in Computer Networks Peter BARTALOS Slovak University of Technology Faculty of Informatics and Information Technologies Ilkovičova 3, 84 6 Bratislava, Slovakia
More information