CS 140 : Feb 19, 2015 Cilk Scheduling & Applications
|
|
- Clifford Austen Hawkins
- 5 years ago
- Views:
Transcription
1 CS 140 : Feb 19, 2015 Cilk Scheduling & Applications Analyzing quicksort Optional: Master method for solving divide-and-conquer recurrences Tips on parallelism and overheads Greedy scheduling and parallel slackness Cilk runtime Thanks to Charles E. Leiserson for some of these slides 1
2 otential arallelism Because the Span Law dictates that T T, the maximum possible speedup given T 1 and T is T 1 /T = potential parallelism = the average amount of work per step along the span. 2
3 Sorting Sorting is possibly the most frequently executed operation in computing! Quicksort is the fastest sorting algorithm in practice with an average running time of O(N log N), (but O(N 2 ) worst case performance) Mergesort has worst case performance of O(N log N) for sorting N elements Both based on the recursive divide-andconquer paradigm 3
4 QUICKSORT Basic Quicksort sorting an array S works as follows: If the number of elements in S is 0 or 1, then return. ick any element v in S. Call this pivot. artition the set S-{v} into two disjoint groups: S 1 = {x ε S-{v} x v} S 2 = {x ε S-{v} x v} Return quicksort(s 1 ) followed by v followed by quicksort(s 2 ) 4
5 QUICKSORT Select ivot
6 QUICKSORT artition around ivot
7 QUICKSORT Quicksort recursively
8 arallelizing Quicksort Serial Quicksort sorts an array S as follows: If the number of elements in S is 0 or 1, then return. ick any element v in S. Call this pivot. artition the set S-{v} into two disjoint groups: S 1 = {x ε S-{v} x v} S 2 = {x ε S-{v} x v} Return quicksort(s 1 ) followed by v followed by quicksort(s 2 ) 8
9 arallel Quicksort (Basic) The second recursive to qsort does not depend on the results of the first recursive We have an opportunity to speed up the by making both s in parallel. template <typename T> void qsort(t begin, T end) { if (begin!= end) { T middle = partition(begin, end, ); } } cilk_ qsort(begin, middle); qsort(max(begin + 1, middle), end); cilk_sync; // No cilk_ on this line! 9
10 Actual erformance./qsort cilk_set_worker_count 1 >> seconds./qsort cilk_set_worker_count 16 >> seconds Speedup = T 1 /T 16 = 0.083/0.014 =
11 Actual erformance./qsort cilk_set_worker_count 1 >> seconds./qsort cilk_set_worker_count 16 >> seconds Speedup = T 1 /T 16 = 0.083/0.014 = 5.93./qsort cilk_set_worker_count 1 >> seconds./qsort cilk_set_worker_count 16 >> 1.58 seconds Speedup = T 1 /T 16 = 10.57/1.58 = 6.67 Why not better??? 11
12 Measure Work/Span Empiriy cilkview -w./qsort Work = Span = Burdened span = arallelism = Burdened parallelism = #Spawn = #Atomic instructions = 14 cilkview -w./qsort Work = Span = Burdened span = arallelism = Burdened parallelism = #Spawn = #Atomic instructions = 8 workspan ws; ws.start(); sample_qsort(a, a + n); ws.stop(); ws.report(std::cout); 12
13 Analyzing Quicksort Quicksort recursively Assume we have a great partitioner that always generates two balanced sets 13
14 Analyzing Quicksort Work: T 1 (n) = 2T 1 (n/2) + Θ(n) 2T 1 (n/2) = 4T 1 (n/4) + 2 Θ(n/2).. + n/2 T 1 (2) = n T 1 (1) + n/2 Θ(2) T 1 (n) = Θ(n lg n) Span recurrence: T (n) = T (n/2) + Θ(n) Solves to T (n) = Θ(n) 14
15 Analyzing Quicksort arallelism: T 1 (n) T (n) = Θ(lg n) Not much! Indeed, partitioning (i.e., constructing the array S 1 = {x ε S-{v} x v}) can be accomplished in parallel in time Θ(lg n) Which gives a span T (n) = Θ(lg 2 n ) And parallelism Θ(n/lg n) Way better! Basic parallel qsort can be found under $cilkpath/examples/qsort 15
16 The Master Method (Optional) The Master Method for solving recurrences applies to recurrences of the form T(n) = a T(n/b) + f(n) *, where a 1, b > 1, and f is asymptotiy positive. IDEA: Compare n log ba with f(n). * The base case is always T(n) = Θ(1) for sufficiently small n. 16
17 Master Method CASE 1 T(n) = a T(n/b) + f(n) n log b a f(n) Specifiy, f(n) = O(n log b a ε ) for some const ε > 0 Solution: T(n) = Θ(n log b a ) Strassen matrix multiplication: a =7, b=2, f(n) = n 2 è T 1 (n) = Θ(n log 2 7 ) = O(n 2.81 ) 17
18 Master Method CASE 2 T(n) = a T(n/b) + f(n) n log b a f(n) Specifiy, f(n) = Θ(n log b a lg k n) for some const k 0 Solution: T(n) = Θ(n log b a lg k+1 n)) quicksort work: a=2, b=2, f(n)=n, k=0 è T 1 (n) = Θ(n lg n) qsort span: a=1, b=2, f(n)=lg n, k=1 è T (n) = Θ(lg 2 n) 18
19 Master Method CASE 3 T(n) = a T(n/b) + f(n) n log b a f(n) Specifiy, f(n) = Ω(n log b a + ε ) for some const ε>0, and (regularity) a f(n/b) c f(n) for some const c<1 Solution: T(n) = Θ(f(n)) Eg: qsort span (bad version): a=1, b=2, f(n)=n è T (n) = Θ(n) 19
20 Master Method Summary T(n) = a T(n/b) + f(n) CASE 1: f (n) = O(n log ba ε ), constant ε > 0 T(n) = Θ(n log ba ). CASE 2: f (n) = Θ(n log ba lg k n), constant k 0 T(n) = Θ(n log ba lg k+1 n). CASE 3: f (n) = Ω(n log ba + ε ), constant ε > 0, and regularity condition T(n) = Θ(f(n)). 20
21 otential arallelism Because the Span Law dictates that T T, the maximum possible speedup given T 1 and T is T 1 /T = potential parallelism = the average amount of work per step along the span. 21
22 Three Tips on arallelism 1. Minimize span to maximize parallelism. Try to generate 10 times more parallelism than processors for near-perfect linear speedup. 2. If you have plenty of parallelism, try to trade some if it off for reduced work overheads. 3. Use divide-and-conquer recursion or parallel loops rather than ing one small thing off after another. Do this: Not this: cilk_for (int i=0; i<n; ++i) { foo(i); } for (int i=0; i<n; ++i) { cilk_ foo(i); } cilk_sync; 22
23 Three Tips on Overheads 1. Make sure that work/#s is not too small. Coarsen by using function s and inlining near the leaves of recursion rather than ing. 2. arallelize outer loops if you can, not inner loops (otherwise, you ll have high burdened parallelism, which includes runtime and scheduling overhead). If you must parallelize an inner loop, coarsen it, but not too much. 500 iterations should be plenty coarse for even the most meager loop. Fewer iterations should suffice for fatter loops. 3. Use reducers only in sufficiently fat loops. 23
24 Scheduling A strand is a sequence of instructions that doesn t contain any parallel constructs Cilk allows the programmer to express potential parallelism in an application. The Cilk scheduler maps strands onto processors dynamiy at runtime. Since on-line schedulers are complicated, we ll explore the ideas with an off-line scheduler. Memory Network I/O $ $ $ 24
25 Greedy Scheduling IDEA: Do as much as possible on every step. Definition: A strand is ready if all its predecessors have executed. 25
26 Greedy Scheduling IDEA: Do as much as possible on every step. Definition: A strand is ready if all its predecessors have executed. Complete step strands ready. Run any. = 3 26
27 Greedy Scheduling IDEA: Do as much as possible on every step. Definition: A strand is ready if all its predecessors have executed. Complete step strands ready. Run any. Incomplete step < strands ready. Run all of them. = 3 27
28 Analysis of Greedy Theorem : Any greedy scheduler achieves T T 1 / + T. roof. # complete steps T 1 /, since each complete step performs work. # incomplete steps T, since each incomplete step reduces the span of the unexecuted dag by 1. = 3 28
29 Optimality of Greedy Theorem. Any greedy scheduler achieves within a factor of 2 of optimal. roof. Let T * be the execution time produced by the optimal scheduler. Since T * max{t 1 /, T } by the Work and Span Laws, we have T T 1 / + T 2 max{t 1 /, T } 2T *. 29
30 Linear Speedup Theorem. Any greedy scheduler achieves near-perfect linear speedup whenever T 1 /T. roof. Since T 1 /T is equivalent to T T 1 /, the Greedy Scheduling Theorem gives us T T 1 / + T T 1 /. Thus, the speedup is T 1 /T. Definition. The quantity T 1 /T is ed the parallel slackness. 30
31 Cilk Runtime System Each worker (processor) maintains a work deque of ready strands, and it manipulates the bottom of the deque like a stack Call! 31
32 Cilk Runtime System Each worker (processor) maintains a work deque of ready strands, and it manipulates the bottom of the deque like a stack Spawn! 32
33 Cilk Runtime System Each worker (processor) maintains a work deque of ready strands, and it manipulates the bottom of the deque like a stack Spawn! Call! Spawn! 33
34 Cilk Runtime System Each worker (processor) maintains a work deque of ready strands, and it manipulates the bottom of the deque like a stack Return! 34
35 Cilk Runtime System Each worker (processor) maintains a work deque of ready strands, and it manipulates the bottom of the deque like a stack Return! 35
36 Cilk Runtime System Each worker (processor) maintains a work deque of ready strands, and it manipulates the bottom of the deque like a stack Steal! When a worker runs out of work, it steals from the top of a random victim s deque. 36
37 Cilk Runtime System Each worker (processor) maintains a work deque of ready strands, and it manipulates the bottom of the deque like a stack Steal! When a worker runs out of work, it steals from the top of a random victim s deque. 37
38 Cilk Runtime System Each worker (processor) maintains a work deque of ready strands, and it manipulates the bottom of the deque like a stack When a worker runs out of work, it steals from the top of a random victim s deque. 38
39 Cilk Runtime System Each worker (processor) maintains a work deque of ready strands, and it manipulates the bottom of the deque like a stack Spawn! When a worker runs out of work, it steals from the top of a random victim s deque. 39
40 Cilk Runtime System Each worker (processor) maintains a work deque of ready strands, and it manipulates the bottom of the deque like a stack When a worker runs out of work, it steals from the top of a random victim s deque. 40
41 Cilk Runtime System Each worker (processor) maintains a work deque of ready strands, and it manipulates the bottom of the deque like a stack Theorem: With sufficient parallelism, workers steal infrequently linear speed-up. 41
CS 140 : Non-numerical Examples with Cilk++
CS 140 : Non-numerical Examples with Cilk++ Divide and conquer paradigm for Cilk++ Quicksort Mergesort Thanks to Charles E. Leiserson for some of these slides 1 Work and Span (Recap) T P = execution time
More informationCS 240A: Shared Memory & Multicore Programming with Cilk++
CS 240A: Shared Memory & Multicore rogramming with Cilk++ Multicore and NUMA architectures Multithreaded rogramming Cilk++ as a concurrency platform Work and Span Thanks to Charles E. Leiserson for some
More informationParallelism and Performance
6.172 erformance Engineering of Software Systems LECTURE 13 arallelism and erformance Charles E. Leiserson October 26, 2010 2010 Charles E. Leiserson 1 Amdahl s Law If 50% of your application is parallel
More informationPlan. 1 Parallelism Complexity Measures. 2 cilk for Loops. 3 Scheduling Theory and Implementation. 4 Measuring Parallelism in Practice
lan Multithreaded arallelism and erformance Measures Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS 4435 - CS 9624 1 2 cilk for Loops 3 4 Measuring arallelism in ractice 5
More informationPlan. 1 Parallelism Complexity Measures. 2 cilk for Loops. 3 Scheduling Theory and Implementation. 4 Measuring Parallelism in Practice
lan Multithreaded arallelism and erformance Measures Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS 3101 1 2 cilk for Loops 3 4 Measuring arallelism in ractice 5 Announcements
More informationMultithreaded Parallelism and Performance Measures
Multithreaded Parallelism and Performance Measures Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS 3101 (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101
More informationPlan. 1 Parallelism Complexity Measures. 2 cilk for Loops. 3 Scheduling Theory and Implementation. 4 Measuring Parallelism in Practice
lan Multithreaded arallelism and erformance Measures Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS 02 - CS 9535 arallelism Complexity Measures 2 cilk for Loops 3 Measuring
More informationMultithreaded Programming in. Cilk LECTURE 1. Charles E. Leiserson
Multithreaded Programming in Cilk LECTURE 1 Charles E. Leiserson Supercomputing Technologies Research Group Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
More informationThe Cilk part is a small set of linguistic extensions to C/C++ to support fork-join parallelism. (The Plus part supports vector parallelism.
Cilk Plus The Cilk part is a small set of linguistic extensions to C/C++ to support fork-join parallelism. (The Plus part supports vector parallelism.) Developed originally by Cilk Arts, an MIT spinoff,
More informationMultithreaded Parallelism on Multicore Architectures
Multithreaded Parallelism on Multicore Architectures Marc Moreno Maza University of Western Ontario, Canada CS2101 November 2012 Plan 1 Multicore programming Multicore architectures 2 Cilk / Cilk++ / Cilk
More informationShared-memory Parallel Programming with Cilk Plus
Shared-memory Parallel Programming with Cilk Plus John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 4 30 August 2018 Outline for Today Threaded programming
More informationCS 140 : Numerical Examples on Shared Memory with Cilk++
CS 140 : Numerical Examples on Shared Memory with Cilk++ Matrix-matrix multiplication Matrix-vector multiplication Hyperobjects Thanks to Charles E. Leiserson for some of these slides 1 Work and Span (Recap)
More informationParallel Algorithms CSE /22/2015. Outline of this lecture: 1 Implementation of cilk for. 2. Parallel Matrix Multiplication
CSE 539 01/22/2015 Parallel Algorithms Lecture 3 Scribe: Angelina Lee Outline of this lecture: 1. Implementation of cilk for 2. Parallel Matrix Multiplication 1 Implementation of cilk for We mentioned
More informationShared-memory Parallel Programming with Cilk Plus
Shared-memory Parallel Programming with Cilk Plus John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 4 19 January 2017 Outline for Today Threaded programming
More informationCSE 260 Lecture 19. Parallel Programming Languages
CSE 260 Lecture 19 Parallel Programming Languages Announcements Thursday s office hours are cancelled Office hours on Weds 2p to 4pm Jing will hold OH, too, see Moodle Scott B. Baden /CSE 260/ Winter 2014
More informationCilk. Cilk In 2008, ACM SIGPLAN awarded Best influential paper of Decade. Cilk : Biggest principle
CS528 Slides are adopted from http://supertech.csail.mit.edu/cilk/ Charles E. Leiserson A Sahu Dept of CSE, IIT Guwahati HPC Flow Plan: Before MID Processor + Super scalar+ Vector Unit Serial C/C++ Coding
More informationCSE 613: Parallel Programming
CSE 613: Parallel Programming Lecture 3 ( The Cilk++ Concurrency Platform ) ( inspiration for many slides comes from talks given by Charles Leiserson and Matteo Frigo ) Rezaul A. Chowdhury Department of
More informationMultithreaded Programming in. Cilk LECTURE 2. Charles E. Leiserson
Multithreaded Programming in Cilk LECTURE 2 Charles E. Leiserson Supercomputing Technologies Research Group Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
More informationMulticore programming in CilkPlus
Multicore programming in CilkPlus Marc Moreno Maza University of Western Ontario, Canada CS3350 March 16, 2015 CilkPlus From Cilk to Cilk++ and Cilk Plus Cilk has been developed since 1994 at the MIT Laboratory
More informationMultithreaded Algorithms Part 2. Dept. of Computer Science & Eng University of Moratuwa
CS4460 Advanced d Algorithms Batch 08, L4S2 Lecture 12 Multithreaded Algorithms Part 2 N. H. N. D. de Silva Dept. of Computer Science & Eng University of Moratuwa Outline: Multithreaded Algorithms Part
More informationQuicksort. Repeat the process recursively for the left- and rightsub-blocks.
Quicksort As the name implies, this is the fastest known sorting algorithm in practice. It is excellent for average input but bad for the worst-case input. (you will see later). Basic idea: (another divide-and-conquer
More informationAnalysis of Multithreaded Algorithms
Analysis of Multithreaded Algorithms Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS4402-9535 (Moreno Maza) Analysis of Multithreaded Algorithms CS4402-9535 1 / 27 Plan 1 Matrix
More informationThe Limits of Sorting Divide-and-Conquer Comparison Sorts II
The Limits of Sorting Divide-and-Conquer Comparison Sorts II CS 311 Data Structures and Algorithms Lecture Slides Monday, October 12, 2009 Glenn G. Chappell Department of Computer Science University of
More informationMultithreaded Algorithms Part 1. Dept. of Computer Science & Eng University of Moratuwa
CS4460 Advanced d Algorithms Batch 08, L4S2 Lecture 11 Multithreaded Algorithms Part 1 N. H. N. D. de Silva Dept. of Computer Science & Eng University of Moratuwa Announcements Last topic discussed is
More informationMultithreaded Parallelism on Multicore Architectures
Multithreaded Parallelism on Multicore Architectures Marc Moreno Maza University of Western Ontario, Canada CS2101 March 2012 Plan 1 Multicore programming Multicore architectures 2 Cilk / Cilk++ / Cilk
More information17/05/2018. Outline. Outline. Divide and Conquer. Control Abstraction for Divide &Conquer. Outline. Module 2: Divide and Conquer
Module 2: Divide and Conquer Divide and Conquer Control Abstraction for Divide &Conquer 1 Recurrence equation for Divide and Conquer: If the size of problem p is n and the sizes of the k sub problems are
More informationCS 6463: AT Computational Geometry Spring 2006 Convex Hulls Carola Wenk
CS 6463: AT Computational Geometry Spring 2006 Convex Hulls Carola Wenk 8/29/06 CS 6463: AT Computational Geometry 1 Convex Hull Problem Given a set of pins on a pinboard and a rubber band around them.
More informationAnalysis of Algorithms - Quicksort -
Analysis of Algorithms - Quicksort - Andreas Ermedahl MRTC (Mälardalens Real-Time Reseach Center) andreas.ermedahl@mdh.se Autumn 2003 Quicksort Proposed by C.A.R. Hoare in 962. Divide- and- conquer algorithm
More informationCilk, Matrix Multiplication, and Sorting
6.895 Theory of Parallel Systems Lecture 2 Lecturer: Charles Leiserson Cilk, Matrix Multiplication, and Sorting Lecture Summary 1. Parallel Processing With Cilk This section provides a brief introduction
More informationChapter 4. Divide-and-Conquer. Copyright 2007 Pearson Addison-Wesley. All rights reserved.
Chapter 4 Divide-and-Conquer Copyright 2007 Pearson Addison-Wesley. All rights reserved. Divide-and-Conquer The most-well known algorithm design strategy: 2. Divide instance of problem into two or more
More informationEECS 2011M: Fundamentals of Data Structures
M: Fundamentals of Data Structures Instructor: Suprakash Datta Office : LAS 3043 Course page: http://www.eecs.yorku.ca/course/2011m Also on Moodle Note: Some slides in this lecture are adopted from James
More informationDIVIDE & CONQUER. Problem of size n. Solution to sub problem 1
DIVIDE & CONQUER Definition: Divide & conquer is a general algorithm design strategy with a general plan as follows: 1. DIVIDE: A problem s instance is divided into several smaller instances of the same
More informationCSC Design and Analysis of Algorithms
CSC 8301- Design and Analysis of Algorithms Lecture 6 Divide and Conquer Algorithm Design Technique Divide-and-Conquer The most-well known algorithm design strategy: 1. Divide a problem instance into two
More informationDivide & Conquer. 2. Conquer the sub-problems by solving them recursively. 1. Divide the problem into number of sub-problems
Divide & Conquer Divide & Conquer The Divide & Conquer approach breaks down the problem into multiple smaller sub-problems, solves the sub-problems recursively, then combines the solutions of the sub-problems
More informationCSC Design and Analysis of Algorithms. Lecture 6. Divide and Conquer Algorithm Design Technique. Divide-and-Conquer
CSC 8301- Design and Analysis of Algorithms Lecture 6 Divide and Conquer Algorithm Design Technique Divide-and-Conquer The most-well known algorithm design strategy: 1. Divide a problem instance into two
More informationCS473 - Algorithms I
CS473 - Algorithms I Lecture 4 The Divide-and-Conquer Design Paradigm View in slide-show mode 1 Reminder: Merge Sort Input array A sort this half sort this half Divide Conquer merge two sorted halves Combine
More informationIntroduction to Multithreaded Algorithms
Introduction to Multithreaded Algorithms CCOM5050: Design and Analysis of Algorithms Chapter VII Selected Topics T. H. Cormen, C. E. Leiserson, R. L. Rivest, C. Stein. Introduction to algorithms, 3 rd
More informationMergesort again. 1. Split the list into two equal parts
Quicksort Mergesort again 1. Split the list into two equal parts 5 3 9 2 8 7 3 2 1 4 5 3 9 2 8 7 3 2 1 4 Mergesort again 2. Recursively mergesort the two parts 5 3 9 2 8 7 3 2 1 4 2 3 5 8 9 1 2 3 4 7 Mergesort
More informationDivide-and-Conquer. The most-well known algorithm design strategy: smaller instances. combining these solutions
Divide-and-Conquer The most-well known algorithm design strategy: 1. Divide instance of problem into two or more smaller instances 2. Solve smaller instances recursively 3. Obtain solution to original
More informationParallel Programming with Cilk
Project 4-1 Parallel Programming with Cilk Out: Friday, 23 Oct 2009 Due: 10AM on Wednesday, 4 Nov 2009 In this project you will learn how to parallel program with Cilk. In the first part of the project,
More informationCilk Plus: Multicore extensions for C and C++
Cilk Plus: Multicore extensions for C and C++ Matteo Frigo 1 June 6, 2011 1 Some slides courtesy of Prof. Charles E. Leiserson of MIT. Intel R Cilk TM Plus What is it? C/C++ language extensions supporting
More informationDivide and Conquer 4-0
Divide and Conquer 4-0 Divide-and-Conquer The most-well known algorithm design strategy: 1. Divide instance of problem into two or more smaller instances 2. Solve smaller instances recursively 3. Obtain
More informationCSE 3101: Introduction to the Design and Analysis of Algorithms. Office hours: Wed 4-6 pm (CSEB 3043), or by appointment.
CSE 3101: Introduction to the Design and Analysis of Algorithms Instructor: Suprakash Datta (datta[at]cse.yorku.ca) ext 77875 Lectures: Tues, BC 215, 7 10 PM Office hours: Wed 4-6 pm (CSEB 3043), or by
More informationThe DAG Model; Analysis of For-Loops; Reduction
CSE341T 09/06/2017 Lecture 3 The DAG Model; Analysis of For-Loops; Reduction We will now formalize the DAG model. We will also see how parallel for loops are implemented and what are reductions. 1 The
More information11. Divide and Conquer
The whole is of necessity prior to the part. Aristotle 11. Divide and Conquer 11.1. Divide and Conquer Many recursive algorithms follow the divide and conquer philosophy. They divide the problem into smaller
More informationLecture 8: Mergesort / Quicksort Steven Skiena
Lecture 8: Mergesort / Quicksort Steven Skiena Department of Computer Science State University of New York Stony Brook, NY 11794 4400 http://www.cs.stonybrook.edu/ skiena Problem of the Day Give an efficient
More informationDivide-and-Conquer. Dr. Yingwu Zhu
Divide-and-Conquer Dr. Yingwu Zhu Divide-and-Conquer The most-well known algorithm design technique: 1. Divide instance of problem into two or more smaller instances 2. Solve smaller instances independently
More informationPerformance Optimization Part 1: Work Distribution and Scheduling
Lecture 5: Performance Optimization Part 1: Work Distribution and Scheduling Parallel Computer Architecture and Programming CMU 15-418/15-618, Fall 2017 Programming for high performance Optimizing the
More informationTaking Stock. IE170: Algorithms in Systems Engineering: Lecture 5. The Towers of Hanoi. Divide and Conquer
Taking Stock IE170: Algorithms in Systems Engineering: Lecture 5 Jeff Linderoth Department of Industrial and Systems Engineering Lehigh University January 24, 2007 Last Time In-Place, Out-of-Place Count
More informationCILK/CILK++ AND REDUCERS YUNMING ZHANG RICE UNIVERSITY
CILK/CILK++ AND REDUCERS YUNMING ZHANG RICE UNIVERSITY 1 OUTLINE CILK and CILK++ Language Features and Usages Work stealing runtime CILK++ Reducers Conclusions 2 IDEALIZED SHARED MEMORY ARCHITECTURE Hardware
More informationA Minicourse on Dynamic Multithreaded Algorithms
Introduction to Algorithms December 5, 005 Massachusetts Institute of Technology 6.046J/18.410J Professors Erik D. Demaine and Charles E. Leiserson Handout 9 A Minicourse on Dynamic Multithreaded Algorithms
More information4. Sorting and Order-Statistics
4. Sorting and Order-Statistics 4. Sorting and Order-Statistics The sorting problem consists in the following : Input : a sequence of n elements (a 1, a 2,..., a n ). Output : a permutation (a 1, a 2,...,
More informationReducers and other Cilk++ hyperobjects
Reducers and other Cilk++ hyperobjects Matteo Frigo (Intel) ablo Halpern (Intel) Charles E. Leiserson (MIT) Stephen Lewin-Berlin (Intel) August 11, 2009 Collision detection Assembly: Represented as a tree
More informationToday: Amortized Analysis (examples) Multithreaded Algs.
Today: Amortized Analysis (examples) Multithreaded Algs. COSC 581, Algorithms March 11, 2014 Many of these slides are adapted from several online sources Reading Assignments Today s class: Chapter 17 (Amortized
More informationA Quick Introduction To The Intel Cilk Plus Runtime
A Quick Introduction To The Intel Cilk Plus Runtime 6.S898: Advanced Performance Engineering for Multicore Applications March 8, 2017 Adapted from slides by Charles E. Leiserson, Saman P. Amarasinghe,
More informationPerformance Optimization Part 1: Work Distribution and Scheduling
( how to be l33t ) Lecture 6: Performance Optimization Part 1: Work Distribution and Scheduling Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2017 Tunes Kaleo Way Down We Go
More informationPractical Session #11 - Sort properties, QuickSort algorithm, Selection
Practical Session #11 - Sort properties, QuickSort algorithm, Selection Quicksort quicksort( A, low, high ) if( high > low ) pivot partition( A, low, high ) // quicksort( A, low, pivot-1 ) quicksort( A,
More informationCS Divide and Conquer
CS483-07 Divide and Conquer Instructor: Fei Li Room 443 ST II Office hours: Tue. & Thur. 1:30pm - 2:30pm or by appointments lifei@cs.gmu.edu with subject: CS483 http://www.cs.gmu.edu/ lifei/teaching/cs483_fall07/
More informationQuicksort (Weiss chapter 8.6)
Quicksort (Weiss chapter 8.6) Recap of before Easter We saw a load of sorting algorithms, including mergesort To mergesort a list: Split the list into two halves Recursively mergesort the two halves Merge
More informationThe divide-and-conquer paradigm involves three steps at each level of the recursion: Divide the problem into a number of subproblems.
2.3 Designing algorithms There are many ways to design algorithms. Insertion sort uses an incremental approach: having sorted the subarray A[1 j - 1], we insert the single element A[j] into its proper
More informationAlgorithms: Efficiency, Analysis, techniques for solving problems using computer approach or methodology but not programming
Chapter 1 Algorithms: Efficiency, Analysis, and dod Order Preface techniques for solving problems using computer approach or methodology but not programming language e.g., search Collie Collean in phone
More informationAlgorithms. MIS, KUAS, 2007 Ding
Algorithms MIS, KUAS, 2007 Jen-Wen Ding 1 2 2 What is Divide-and-Conquer? Divide-and-Conquer divides an instance of a problem into two or more smaller instances. The smaller instances are usually instances
More informationCS 137 Part 8. Merge Sort, Quick Sort, Binary Search. November 20th, 2017
CS 137 Part 8 Merge Sort, Quick Sort, Binary Search November 20th, 2017 This Week We re going to see two more complicated sorting algorithms that will be our first introduction to O(n log n) sorting algorithms.
More informationDivide-and-Conquer CSE 680
Divide-and-Conquer CSE 680 1 Introduction Given an instance x of a problem, the divide-and-conquer method works as follows. function DAQ(x) if x is sufficiently small or simple then solve it directly else
More informationSorting Algorithms. + Analysis of the Sorting Algorithms
Sorting Algorithms + Analysis of the Sorting Algorithms Insertion Sort What if first k elements of array are already sorted? 4, 7, 12, 5, 19, 16 We can shift the tail of the sorted elements list down and
More informationCompsci 590.3: Introduction to Parallel Computing
Compsci 590.3: Introduction to Parallel Computing Alvin R. Lebeck Slides based on this from the University of Oregon Admin Logistics Homework #3 Use script Project Proposals Document: see web site» Due
More informationData Structures and Algorithms Running time, Divide and Conquer January 25, 2018
Data Structures and Algorithms Running time, Divide and Conquer January 5, 018 Consider the problem of computing n for any non-negative integer n. similar looking algorithms to solve this problem. Below
More informationCase Study: Matrix Multiplication. 6.S898: Advanced Performance Engineering for Multicore Applications February 22, 2017
Case Study: Matrix Multiplication 6.S898: Advanced Performance Engineering for Multicore Applications February 22, 2017 1 4k-by-4k Matrix Multiplication Version Implementation Running time (s) GFLOPS Absolute
More informationParallel Algorithms. 6/12/2012 6:42 PM gdeepak.com 1
Parallel Algorithms 1 Deliverables Parallel Programming Paradigm Matrix Multiplication Merge Sort 2 Introduction It involves using the power of multiple processors in parallel to increase the performance
More informationSelection (deterministic & randomized): finding the median in linear time
Lecture 4 Selection (deterministic & randomized): finding the median in linear time 4.1 Overview Given an unsorted array, how quickly can one find the median element? Can one do it more quickly than bysorting?
More informationQuick Sort. CSE Data Structures May 15, 2002
Quick Sort CSE 373 - Data Structures May 15, 2002 Readings and References Reading Section 7.7, Data Structures and Algorithm Analysis in C, Weiss Other References C LR 15-May-02 CSE 373 - Data Structures
More informationCIS 121 Data Structures and Algorithms with Java Spring Code Snippets and Recurrences Monday, January 29/Tuesday, January 30
CIS 11 Data Structures and Algorithms with Java Spring 018 Code Snippets and Recurrences Monday, January 9/Tuesday, January 30 Learning Goals Practice solving recurrences and proving asymptotic bounds
More informationCPSC 320 Midterm 2 Thursday March 13th, 2014
CPSC 320 Midterm 2 Thursday March 13th, 2014 [12] 1. Answer each question with True or False, and then justify your answer briefly. [2] (a) The Master theorem can be applied to the recurrence relation
More informationSimple Sorting Algorithms
Simple Sorting Algorithms Outline We are going to look at three simple sorting techniques: Bubble Sort, Selection Sort, and Insertion Sort We are going to develop the notion of a loop invariant We will
More informationData Structure Lecture#17: Internal Sorting 2 (Chapter 7) U Kang Seoul National University
Data Structure Lecture#17: Internal Sorting 2 (Chapter 7) U Kang Seoul National University U Kang 1 In This Lecture Main ideas and analysis of Merge sort Main ideas and analysis of Quicksort U Kang 2 Merge
More informationComputer Science & Engineering 423/823 Design and Analysis of Algorithms
Computer Science & Engineering 423/823 Design and Analysis of s Lecture 01 Medians and s (Chapter 9) Stephen Scott (Adapted from Vinodchandran N. Variyam) 1 / 24 Spring 2010 Given an array A of n distinct
More informationWe can use a max-heap to sort data.
Sorting 7B N log N Sorts 1 Heap Sort We can use a max-heap to sort data. Convert an array to a max-heap. Remove the root from the heap and store it in its proper position in the same array. Repeat until
More informationDivide-and-Conquer. Combine solutions to sub-problems into overall solution. Break up problem of size n into two equal parts of size!n.
Chapter 5 Divide and Conquer Slides by Kevin Wayne. Copyright 25 Pearson-Addon Wesley. All rights reserved. Divide-and-Conquer Divide-and-conquer. Break up problem into several parts. Solve each part recursively.
More informationDivide and Conquer Algorithms
CSE341T 09/13/2017 Lecture 5 Divide and Conquer Algorithms We have already seen a couple of divide and conquer algorithms in this lecture. The reduce algorithm and the algorithm to copy elements of the
More informationRandomized Algorithms, Quicksort and Randomized Selection
CMPS 2200 Fall 2017 Randomized Algorithms, Quicksort and Randomized Selection Carola Wenk Slides by Carola Wenk and Charles Leiserson CMPS 2200 Intro. to Algorithms 1 Deterministic Algorithms Runtime for
More informationSorting. Sorting. Stable Sorting. In-place Sort. Bubble Sort. Bubble Sort. Selection (Tournament) Heapsort (Smoothsort) Mergesort Quicksort Bogosort
Principles of Imperative Computation V. Adamchik CS 15-1 Lecture Carnegie Mellon University Sorting Sorting Sorting is ordering a list of objects. comparison non-comparison Hoare Knuth Bubble (Shell, Gnome)
More informationSORTING AND SELECTION
2 < > 1 4 8 6 = 9 CHAPTER 12 SORTING AND SELECTION ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN JAVA, GOODRICH, TAMASSIA AND GOLDWASSER (WILEY 2016)
More informationCS 360 Exam 1 Fall 2014 Name. 1. Answer the following questions about each code fragment below. [8 points]
CS 360 Exam 1 Fall 2014 Name 1. Answer the following questions about each code fragment below. [8 points] for (v=1; v
More informationPractice Problems for the Final
ECE-250 Algorithms and Data Structures (Winter 2012) Practice Problems for the Final Disclaimer: Please do keep in mind that this problem set does not reflect the exact topics or the fractions of each
More informationUniversity of Waterloo Department of Electrical and Computer Engineering ECE 250 Data Structures and Algorithms. Final Examination T09:00
University of Waterloo Department of Electrical and Computer Engineering ECE 250 Data Structures and Algorithms Instructor: Douglas Wilhelm Harder Time: 2.5 hours Aides: none 18 pages Final Examination
More informationCS S-11 Sorting in Θ(nlgn) 1. Base Case: A list of length 1 or length 0 is already sorted. Recursive Case:
CS245-2015S-11 Sorting in Θ(nlgn) 1 11-0: Merge Sort Recursive Sorting Base Case: A list of length 1 or length 0 is already sorted Recursive Case: Split the list in half Recursively sort two halves Merge
More informationCOMP2012H Spring 2014 Dekai Wu. Sorting. (more on sorting algorithms: mergesort, quicksort, heapsort)
COMP2012H Spring 2014 Dekai Wu Sorting (more on sorting algorithms: mergesort, quicksort, heapsort) Merge Sort Recursive sorting strategy. Let s look at merge(.. ) first. COMP2012H (Sorting) 2 COMP2012H
More informationDivide and Conquer. Design and Analysis of Algorithms Andrei Bulatov
Divide and Conquer Design and Analysis of Algorithms Andrei Bulatov Algorithms Divide and Conquer 4-2 Divide and Conquer, MergeSort Recursive algorithms: Call themselves on subproblem Divide and Conquer
More informationMath 501 Solutions to Homework Assignment 10
Department of Mathematics Discrete Mathematics Math 501 Solutions to Homework Assignment 10 Section 7.3 Additional Exercises 1. int CountValuesLessThanT(int a[], int n, int T) { int count = 0; for (int
More informationComputer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Divide and Conquer
Computer Science 385 Analysis of Algorithms Siena College Spring 2011 Topic Notes: Divide and Conquer Divide and-conquer is a very common and very powerful algorithm design technique. The general idea:
More informationThe Cilkview Scalability Analyzer
The Cilkview Scalability Analyzer Yuxiong He Charles E. Leiserson William M. Leiserson Intel Corporation Nashua, New Hampshire ABSTRACT The Cilkview scalability analyzer is a software tool for profiling,
More informationChapter 2 Divid d e-an -Conquer 1
Chapter 2 Divide-and-Conquerid d 1 A top-down approach Patterned after the strategy employed by Napoleon Divide an instance of a problem recursively into two or more smaller instances until the solutions
More informationUnit Outline. Comparing Sorting Algorithms Heapsort Mergesort Quicksort More Comparisons Complexity of Sorting 2 / 33
Unit #4: Sorting CPSC : Basic Algorithms and Data Structures Anthony Estey, Ed Knorr, and Mehrdad Oveisi 0W Unit Outline Comparing Sorting Algorithms Heapsort Mergesort Quicksort More Comparisons Complexity
More informationCS Divide and Conquer
CS483-07 Divide and Conquer Instructor: Fei Li Room 443 ST II Office hours: Tue. & Thur. 1:30pm - 2:30pm or by appointments lifei@cs.gmu.edu with subject: CS483 http://www.cs.gmu.edu/ lifei/teaching/cs483_fall07/
More informationData Structures and Algorithms CSE 465
Data Structures and Algorithms CSE 465 LECTURE 4 More Divide and Conquer Binary Search Exponentiation Multiplication Sofya Raskhodnikova and Adam Smith Review questions How long does Merge Sort take on
More informationCOMP 3403 Algorithm Analysis Part 2 Chapters 4 5. Jim Diamond CAR 409 Jodrey School of Computer Science Acadia University
COMP 3403 Algorithm Analysis Part 2 Chapters 4 5 Jim Diamond CAR 409 Jodrey School of Computer Science Acadia University Chapter 4 Decrease and Conquer Chapter 4 43 Chapter 4: Decrease and Conquer Idea:
More informationCS483 Analysis of Algorithms Lecture 03 Divide-n-Conquer
CS483 Analysis of Algorithms Lecture 03 Divide-n-Conquer Jyh-Ming Lien February 05, 2009 this lecture note is based on Algorithms by S. Dasgupta, C.H. Papadimitriou, and U.V. Vazirani and to the Design
More informationIntroduction to Algorithms
Thomas H. Cormen Charles E. Leiserson Ronald L. Rivest Clifford Stein Introduction to Algorithms Third Edition The MIT Press Cambridge, Massachusetts London, England 27 Multithreaded Algorithms The vast
More informatione-pg PATHSHALA- Computer Science Design and Analysis of Algorithms Module 10
e-pg PATHSHALA- Computer Science Design and Analysis of Algorithms Module 10 Component-I (B) Description of Module Items Subject Name Computer Science Description of Module Paper Name Module Name/Title
More informationPlotting run-time graphically. Plotting run-time graphically. CS241 Algorithmics - week 1 review. Prefix Averages - Algorithm #1
CS241 - week 1 review Special classes of algorithms: logarithmic: O(log n) linear: O(n) quadratic: O(n 2 ) polynomial: O(n k ), k 1 exponential: O(a n ), a > 1 Classifying algorithms is generally done
More information