CS 140 : Feb 19, 2015 Cilk Scheduling & Applications

Size: px
Start display at page:

Download "CS 140 : Feb 19, 2015 Cilk Scheduling & Applications"

Transcription

1 CS 140 : Feb 19, 2015 Cilk Scheduling & Applications Analyzing quicksort Optional: Master method for solving divide-and-conquer recurrences Tips on parallelism and overheads Greedy scheduling and parallel slackness Cilk runtime Thanks to Charles E. Leiserson for some of these slides 1

2 otential arallelism Because the Span Law dictates that T T, the maximum possible speedup given T 1 and T is T 1 /T = potential parallelism = the average amount of work per step along the span. 2

3 Sorting Sorting is possibly the most frequently executed operation in computing! Quicksort is the fastest sorting algorithm in practice with an average running time of O(N log N), (but O(N 2 ) worst case performance) Mergesort has worst case performance of O(N log N) for sorting N elements Both based on the recursive divide-andconquer paradigm 3

4 QUICKSORT Basic Quicksort sorting an array S works as follows: If the number of elements in S is 0 or 1, then return. ick any element v in S. Call this pivot. artition the set S-{v} into two disjoint groups: S 1 = {x ε S-{v} x v} S 2 = {x ε S-{v} x v} Return quicksort(s 1 ) followed by v followed by quicksort(s 2 ) 4

5 QUICKSORT Select ivot

6 QUICKSORT artition around ivot

7 QUICKSORT Quicksort recursively

8 arallelizing Quicksort Serial Quicksort sorts an array S as follows: If the number of elements in S is 0 or 1, then return. ick any element v in S. Call this pivot. artition the set S-{v} into two disjoint groups: S 1 = {x ε S-{v} x v} S 2 = {x ε S-{v} x v} Return quicksort(s 1 ) followed by v followed by quicksort(s 2 ) 8

9 arallel Quicksort (Basic) The second recursive to qsort does not depend on the results of the first recursive We have an opportunity to speed up the by making both s in parallel. template <typename T> void qsort(t begin, T end) { if (begin!= end) { T middle = partition(begin, end, ); } } cilk_ qsort(begin, middle); qsort(max(begin + 1, middle), end); cilk_sync; // No cilk_ on this line! 9

10 Actual erformance./qsort cilk_set_worker_count 1 >> seconds./qsort cilk_set_worker_count 16 >> seconds Speedup = T 1 /T 16 = 0.083/0.014 =

11 Actual erformance./qsort cilk_set_worker_count 1 >> seconds./qsort cilk_set_worker_count 16 >> seconds Speedup = T 1 /T 16 = 0.083/0.014 = 5.93./qsort cilk_set_worker_count 1 >> seconds./qsort cilk_set_worker_count 16 >> 1.58 seconds Speedup = T 1 /T 16 = 10.57/1.58 = 6.67 Why not better??? 11

12 Measure Work/Span Empiriy cilkview -w./qsort Work = Span = Burdened span = arallelism = Burdened parallelism = #Spawn = #Atomic instructions = 14 cilkview -w./qsort Work = Span = Burdened span = arallelism = Burdened parallelism = #Spawn = #Atomic instructions = 8 workspan ws; ws.start(); sample_qsort(a, a + n); ws.stop(); ws.report(std::cout); 12

13 Analyzing Quicksort Quicksort recursively Assume we have a great partitioner that always generates two balanced sets 13

14 Analyzing Quicksort Work: T 1 (n) = 2T 1 (n/2) + Θ(n) 2T 1 (n/2) = 4T 1 (n/4) + 2 Θ(n/2).. + n/2 T 1 (2) = n T 1 (1) + n/2 Θ(2) T 1 (n) = Θ(n lg n) Span recurrence: T (n) = T (n/2) + Θ(n) Solves to T (n) = Θ(n) 14

15 Analyzing Quicksort arallelism: T 1 (n) T (n) = Θ(lg n) Not much! Indeed, partitioning (i.e., constructing the array S 1 = {x ε S-{v} x v}) can be accomplished in parallel in time Θ(lg n) Which gives a span T (n) = Θ(lg 2 n ) And parallelism Θ(n/lg n) Way better! Basic parallel qsort can be found under $cilkpath/examples/qsort 15

16 The Master Method (Optional) The Master Method for solving recurrences applies to recurrences of the form T(n) = a T(n/b) + f(n) *, where a 1, b > 1, and f is asymptotiy positive. IDEA: Compare n log ba with f(n). * The base case is always T(n) = Θ(1) for sufficiently small n. 16

17 Master Method CASE 1 T(n) = a T(n/b) + f(n) n log b a f(n) Specifiy, f(n) = O(n log b a ε ) for some const ε > 0 Solution: T(n) = Θ(n log b a ) Strassen matrix multiplication: a =7, b=2, f(n) = n 2 è T 1 (n) = Θ(n log 2 7 ) = O(n 2.81 ) 17

18 Master Method CASE 2 T(n) = a T(n/b) + f(n) n log b a f(n) Specifiy, f(n) = Θ(n log b a lg k n) for some const k 0 Solution: T(n) = Θ(n log b a lg k+1 n)) quicksort work: a=2, b=2, f(n)=n, k=0 è T 1 (n) = Θ(n lg n) qsort span: a=1, b=2, f(n)=lg n, k=1 è T (n) = Θ(lg 2 n) 18

19 Master Method CASE 3 T(n) = a T(n/b) + f(n) n log b a f(n) Specifiy, f(n) = Ω(n log b a + ε ) for some const ε>0, and (regularity) a f(n/b) c f(n) for some const c<1 Solution: T(n) = Θ(f(n)) Eg: qsort span (bad version): a=1, b=2, f(n)=n è T (n) = Θ(n) 19

20 Master Method Summary T(n) = a T(n/b) + f(n) CASE 1: f (n) = O(n log ba ε ), constant ε > 0 T(n) = Θ(n log ba ). CASE 2: f (n) = Θ(n log ba lg k n), constant k 0 T(n) = Θ(n log ba lg k+1 n). CASE 3: f (n) = Ω(n log ba + ε ), constant ε > 0, and regularity condition T(n) = Θ(f(n)). 20

21 otential arallelism Because the Span Law dictates that T T, the maximum possible speedup given T 1 and T is T 1 /T = potential parallelism = the average amount of work per step along the span. 21

22 Three Tips on arallelism 1. Minimize span to maximize parallelism. Try to generate 10 times more parallelism than processors for near-perfect linear speedup. 2. If you have plenty of parallelism, try to trade some if it off for reduced work overheads. 3. Use divide-and-conquer recursion or parallel loops rather than ing one small thing off after another. Do this: Not this: cilk_for (int i=0; i<n; ++i) { foo(i); } for (int i=0; i<n; ++i) { cilk_ foo(i); } cilk_sync; 22

23 Three Tips on Overheads 1. Make sure that work/#s is not too small. Coarsen by using function s and inlining near the leaves of recursion rather than ing. 2. arallelize outer loops if you can, not inner loops (otherwise, you ll have high burdened parallelism, which includes runtime and scheduling overhead). If you must parallelize an inner loop, coarsen it, but not too much. 500 iterations should be plenty coarse for even the most meager loop. Fewer iterations should suffice for fatter loops. 3. Use reducers only in sufficiently fat loops. 23

24 Scheduling A strand is a sequence of instructions that doesn t contain any parallel constructs Cilk allows the programmer to express potential parallelism in an application. The Cilk scheduler maps strands onto processors dynamiy at runtime. Since on-line schedulers are complicated, we ll explore the ideas with an off-line scheduler. Memory Network I/O $ $ $ 24

25 Greedy Scheduling IDEA: Do as much as possible on every step. Definition: A strand is ready if all its predecessors have executed. 25

26 Greedy Scheduling IDEA: Do as much as possible on every step. Definition: A strand is ready if all its predecessors have executed. Complete step strands ready. Run any. = 3 26

27 Greedy Scheduling IDEA: Do as much as possible on every step. Definition: A strand is ready if all its predecessors have executed. Complete step strands ready. Run any. Incomplete step < strands ready. Run all of them. = 3 27

28 Analysis of Greedy Theorem : Any greedy scheduler achieves T T 1 / + T. roof. # complete steps T 1 /, since each complete step performs work. # incomplete steps T, since each incomplete step reduces the span of the unexecuted dag by 1. = 3 28

29 Optimality of Greedy Theorem. Any greedy scheduler achieves within a factor of 2 of optimal. roof. Let T * be the execution time produced by the optimal scheduler. Since T * max{t 1 /, T } by the Work and Span Laws, we have T T 1 / + T 2 max{t 1 /, T } 2T *. 29

30 Linear Speedup Theorem. Any greedy scheduler achieves near-perfect linear speedup whenever T 1 /T. roof. Since T 1 /T is equivalent to T T 1 /, the Greedy Scheduling Theorem gives us T T 1 / + T T 1 /. Thus, the speedup is T 1 /T. Definition. The quantity T 1 /T is ed the parallel slackness. 30

31 Cilk Runtime System Each worker (processor) maintains a work deque of ready strands, and it manipulates the bottom of the deque like a stack Call! 31

32 Cilk Runtime System Each worker (processor) maintains a work deque of ready strands, and it manipulates the bottom of the deque like a stack Spawn! 32

33 Cilk Runtime System Each worker (processor) maintains a work deque of ready strands, and it manipulates the bottom of the deque like a stack Spawn! Call! Spawn! 33

34 Cilk Runtime System Each worker (processor) maintains a work deque of ready strands, and it manipulates the bottom of the deque like a stack Return! 34

35 Cilk Runtime System Each worker (processor) maintains a work deque of ready strands, and it manipulates the bottom of the deque like a stack Return! 35

36 Cilk Runtime System Each worker (processor) maintains a work deque of ready strands, and it manipulates the bottom of the deque like a stack Steal! When a worker runs out of work, it steals from the top of a random victim s deque. 36

37 Cilk Runtime System Each worker (processor) maintains a work deque of ready strands, and it manipulates the bottom of the deque like a stack Steal! When a worker runs out of work, it steals from the top of a random victim s deque. 37

38 Cilk Runtime System Each worker (processor) maintains a work deque of ready strands, and it manipulates the bottom of the deque like a stack When a worker runs out of work, it steals from the top of a random victim s deque. 38

39 Cilk Runtime System Each worker (processor) maintains a work deque of ready strands, and it manipulates the bottom of the deque like a stack Spawn! When a worker runs out of work, it steals from the top of a random victim s deque. 39

40 Cilk Runtime System Each worker (processor) maintains a work deque of ready strands, and it manipulates the bottom of the deque like a stack When a worker runs out of work, it steals from the top of a random victim s deque. 40

41 Cilk Runtime System Each worker (processor) maintains a work deque of ready strands, and it manipulates the bottom of the deque like a stack Theorem: With sufficient parallelism, workers steal infrequently linear speed-up. 41

CS 140 : Non-numerical Examples with Cilk++

CS 140 : Non-numerical Examples with Cilk++ CS 140 : Non-numerical Examples with Cilk++ Divide and conquer paradigm for Cilk++ Quicksort Mergesort Thanks to Charles E. Leiserson for some of these slides 1 Work and Span (Recap) T P = execution time

More information

CS 240A: Shared Memory & Multicore Programming with Cilk++

CS 240A: Shared Memory & Multicore Programming with Cilk++ CS 240A: Shared Memory & Multicore rogramming with Cilk++ Multicore and NUMA architectures Multithreaded rogramming Cilk++ as a concurrency platform Work and Span Thanks to Charles E. Leiserson for some

More information

Parallelism and Performance

Parallelism and Performance 6.172 erformance Engineering of Software Systems LECTURE 13 arallelism and erformance Charles E. Leiserson October 26, 2010 2010 Charles E. Leiserson 1 Amdahl s Law If 50% of your application is parallel

More information

Plan. 1 Parallelism Complexity Measures. 2 cilk for Loops. 3 Scheduling Theory and Implementation. 4 Measuring Parallelism in Practice

Plan. 1 Parallelism Complexity Measures. 2 cilk for Loops. 3 Scheduling Theory and Implementation. 4 Measuring Parallelism in Practice lan Multithreaded arallelism and erformance Measures Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS 4435 - CS 9624 1 2 cilk for Loops 3 4 Measuring arallelism in ractice 5

More information

Plan. 1 Parallelism Complexity Measures. 2 cilk for Loops. 3 Scheduling Theory and Implementation. 4 Measuring Parallelism in Practice

Plan. 1 Parallelism Complexity Measures. 2 cilk for Loops. 3 Scheduling Theory and Implementation. 4 Measuring Parallelism in Practice lan Multithreaded arallelism and erformance Measures Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS 3101 1 2 cilk for Loops 3 4 Measuring arallelism in ractice 5 Announcements

More information

Multithreaded Parallelism and Performance Measures

Multithreaded Parallelism and Performance Measures Multithreaded Parallelism and Performance Measures Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS 3101 (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101

More information

Plan. 1 Parallelism Complexity Measures. 2 cilk for Loops. 3 Scheduling Theory and Implementation. 4 Measuring Parallelism in Practice

Plan. 1 Parallelism Complexity Measures. 2 cilk for Loops. 3 Scheduling Theory and Implementation. 4 Measuring Parallelism in Practice lan Multithreaded arallelism and erformance Measures Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS 02 - CS 9535 arallelism Complexity Measures 2 cilk for Loops 3 Measuring

More information

Multithreaded Programming in. Cilk LECTURE 1. Charles E. Leiserson

Multithreaded Programming in. Cilk LECTURE 1. Charles E. Leiserson Multithreaded Programming in Cilk LECTURE 1 Charles E. Leiserson Supercomputing Technologies Research Group Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

More information

The Cilk part is a small set of linguistic extensions to C/C++ to support fork-join parallelism. (The Plus part supports vector parallelism.

The Cilk part is a small set of linguistic extensions to C/C++ to support fork-join parallelism. (The Plus part supports vector parallelism. Cilk Plus The Cilk part is a small set of linguistic extensions to C/C++ to support fork-join parallelism. (The Plus part supports vector parallelism.) Developed originally by Cilk Arts, an MIT spinoff,

More information

Multithreaded Parallelism on Multicore Architectures

Multithreaded Parallelism on Multicore Architectures Multithreaded Parallelism on Multicore Architectures Marc Moreno Maza University of Western Ontario, Canada CS2101 November 2012 Plan 1 Multicore programming Multicore architectures 2 Cilk / Cilk++ / Cilk

More information

Shared-memory Parallel Programming with Cilk Plus

Shared-memory Parallel Programming with Cilk Plus Shared-memory Parallel Programming with Cilk Plus John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 4 30 August 2018 Outline for Today Threaded programming

More information

CS 140 : Numerical Examples on Shared Memory with Cilk++

CS 140 : Numerical Examples on Shared Memory with Cilk++ CS 140 : Numerical Examples on Shared Memory with Cilk++ Matrix-matrix multiplication Matrix-vector multiplication Hyperobjects Thanks to Charles E. Leiserson for some of these slides 1 Work and Span (Recap)

More information

Parallel Algorithms CSE /22/2015. Outline of this lecture: 1 Implementation of cilk for. 2. Parallel Matrix Multiplication

Parallel Algorithms CSE /22/2015. Outline of this lecture: 1 Implementation of cilk for. 2. Parallel Matrix Multiplication CSE 539 01/22/2015 Parallel Algorithms Lecture 3 Scribe: Angelina Lee Outline of this lecture: 1. Implementation of cilk for 2. Parallel Matrix Multiplication 1 Implementation of cilk for We mentioned

More information

Shared-memory Parallel Programming with Cilk Plus

Shared-memory Parallel Programming with Cilk Plus Shared-memory Parallel Programming with Cilk Plus John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 4 19 January 2017 Outline for Today Threaded programming

More information

CSE 260 Lecture 19. Parallel Programming Languages

CSE 260 Lecture 19. Parallel Programming Languages CSE 260 Lecture 19 Parallel Programming Languages Announcements Thursday s office hours are cancelled Office hours on Weds 2p to 4pm Jing will hold OH, too, see Moodle Scott B. Baden /CSE 260/ Winter 2014

More information

Cilk. Cilk In 2008, ACM SIGPLAN awarded Best influential paper of Decade. Cilk : Biggest principle

Cilk. Cilk In 2008, ACM SIGPLAN awarded Best influential paper of Decade. Cilk : Biggest principle CS528 Slides are adopted from http://supertech.csail.mit.edu/cilk/ Charles E. Leiserson A Sahu Dept of CSE, IIT Guwahati HPC Flow Plan: Before MID Processor + Super scalar+ Vector Unit Serial C/C++ Coding

More information

CSE 613: Parallel Programming

CSE 613: Parallel Programming CSE 613: Parallel Programming Lecture 3 ( The Cilk++ Concurrency Platform ) ( inspiration for many slides comes from talks given by Charles Leiserson and Matteo Frigo ) Rezaul A. Chowdhury Department of

More information

Multithreaded Programming in. Cilk LECTURE 2. Charles E. Leiserson

Multithreaded Programming in. Cilk LECTURE 2. Charles E. Leiserson Multithreaded Programming in Cilk LECTURE 2 Charles E. Leiserson Supercomputing Technologies Research Group Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

More information

Multicore programming in CilkPlus

Multicore programming in CilkPlus Multicore programming in CilkPlus Marc Moreno Maza University of Western Ontario, Canada CS3350 March 16, 2015 CilkPlus From Cilk to Cilk++ and Cilk Plus Cilk has been developed since 1994 at the MIT Laboratory

More information

Multithreaded Algorithms Part 2. Dept. of Computer Science & Eng University of Moratuwa

Multithreaded Algorithms Part 2. Dept. of Computer Science & Eng University of Moratuwa CS4460 Advanced d Algorithms Batch 08, L4S2 Lecture 12 Multithreaded Algorithms Part 2 N. H. N. D. de Silva Dept. of Computer Science & Eng University of Moratuwa Outline: Multithreaded Algorithms Part

More information

Quicksort. Repeat the process recursively for the left- and rightsub-blocks.

Quicksort. Repeat the process recursively for the left- and rightsub-blocks. Quicksort As the name implies, this is the fastest known sorting algorithm in practice. It is excellent for average input but bad for the worst-case input. (you will see later). Basic idea: (another divide-and-conquer

More information

Analysis of Multithreaded Algorithms

Analysis of Multithreaded Algorithms Analysis of Multithreaded Algorithms Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS4402-9535 (Moreno Maza) Analysis of Multithreaded Algorithms CS4402-9535 1 / 27 Plan 1 Matrix

More information

The Limits of Sorting Divide-and-Conquer Comparison Sorts II

The Limits of Sorting Divide-and-Conquer Comparison Sorts II The Limits of Sorting Divide-and-Conquer Comparison Sorts II CS 311 Data Structures and Algorithms Lecture Slides Monday, October 12, 2009 Glenn G. Chappell Department of Computer Science University of

More information

Multithreaded Algorithms Part 1. Dept. of Computer Science & Eng University of Moratuwa

Multithreaded Algorithms Part 1. Dept. of Computer Science & Eng University of Moratuwa CS4460 Advanced d Algorithms Batch 08, L4S2 Lecture 11 Multithreaded Algorithms Part 1 N. H. N. D. de Silva Dept. of Computer Science & Eng University of Moratuwa Announcements Last topic discussed is

More information

Multithreaded Parallelism on Multicore Architectures

Multithreaded Parallelism on Multicore Architectures Multithreaded Parallelism on Multicore Architectures Marc Moreno Maza University of Western Ontario, Canada CS2101 March 2012 Plan 1 Multicore programming Multicore architectures 2 Cilk / Cilk++ / Cilk

More information

17/05/2018. Outline. Outline. Divide and Conquer. Control Abstraction for Divide &Conquer. Outline. Module 2: Divide and Conquer

17/05/2018. Outline. Outline. Divide and Conquer. Control Abstraction for Divide &Conquer. Outline. Module 2: Divide and Conquer Module 2: Divide and Conquer Divide and Conquer Control Abstraction for Divide &Conquer 1 Recurrence equation for Divide and Conquer: If the size of problem p is n and the sizes of the k sub problems are

More information

CS 6463: AT Computational Geometry Spring 2006 Convex Hulls Carola Wenk

CS 6463: AT Computational Geometry Spring 2006 Convex Hulls Carola Wenk CS 6463: AT Computational Geometry Spring 2006 Convex Hulls Carola Wenk 8/29/06 CS 6463: AT Computational Geometry 1 Convex Hull Problem Given a set of pins on a pinboard and a rubber band around them.

More information

Analysis of Algorithms - Quicksort -

Analysis of Algorithms - Quicksort - Analysis of Algorithms - Quicksort - Andreas Ermedahl MRTC (Mälardalens Real-Time Reseach Center) andreas.ermedahl@mdh.se Autumn 2003 Quicksort Proposed by C.A.R. Hoare in 962. Divide- and- conquer algorithm

More information

Cilk, Matrix Multiplication, and Sorting

Cilk, Matrix Multiplication, and Sorting 6.895 Theory of Parallel Systems Lecture 2 Lecturer: Charles Leiserson Cilk, Matrix Multiplication, and Sorting Lecture Summary 1. Parallel Processing With Cilk This section provides a brief introduction

More information

Chapter 4. Divide-and-Conquer. Copyright 2007 Pearson Addison-Wesley. All rights reserved.

Chapter 4. Divide-and-Conquer. Copyright 2007 Pearson Addison-Wesley. All rights reserved. Chapter 4 Divide-and-Conquer Copyright 2007 Pearson Addison-Wesley. All rights reserved. Divide-and-Conquer The most-well known algorithm design strategy: 2. Divide instance of problem into two or more

More information

EECS 2011M: Fundamentals of Data Structures

EECS 2011M: Fundamentals of Data Structures M: Fundamentals of Data Structures Instructor: Suprakash Datta Office : LAS 3043 Course page: http://www.eecs.yorku.ca/course/2011m Also on Moodle Note: Some slides in this lecture are adopted from James

More information

DIVIDE & CONQUER. Problem of size n. Solution to sub problem 1

DIVIDE & CONQUER. Problem of size n. Solution to sub problem 1 DIVIDE & CONQUER Definition: Divide & conquer is a general algorithm design strategy with a general plan as follows: 1. DIVIDE: A problem s instance is divided into several smaller instances of the same

More information

CSC Design and Analysis of Algorithms

CSC Design and Analysis of Algorithms CSC 8301- Design and Analysis of Algorithms Lecture 6 Divide and Conquer Algorithm Design Technique Divide-and-Conquer The most-well known algorithm design strategy: 1. Divide a problem instance into two

More information

Divide & Conquer. 2. Conquer the sub-problems by solving them recursively. 1. Divide the problem into number of sub-problems

Divide & Conquer. 2. Conquer the sub-problems by solving them recursively. 1. Divide the problem into number of sub-problems Divide & Conquer Divide & Conquer The Divide & Conquer approach breaks down the problem into multiple smaller sub-problems, solves the sub-problems recursively, then combines the solutions of the sub-problems

More information

CSC Design and Analysis of Algorithms. Lecture 6. Divide and Conquer Algorithm Design Technique. Divide-and-Conquer

CSC Design and Analysis of Algorithms. Lecture 6. Divide and Conquer Algorithm Design Technique. Divide-and-Conquer CSC 8301- Design and Analysis of Algorithms Lecture 6 Divide and Conquer Algorithm Design Technique Divide-and-Conquer The most-well known algorithm design strategy: 1. Divide a problem instance into two

More information

CS473 - Algorithms I

CS473 - Algorithms I CS473 - Algorithms I Lecture 4 The Divide-and-Conquer Design Paradigm View in slide-show mode 1 Reminder: Merge Sort Input array A sort this half sort this half Divide Conquer merge two sorted halves Combine

More information

Introduction to Multithreaded Algorithms

Introduction to Multithreaded Algorithms Introduction to Multithreaded Algorithms CCOM5050: Design and Analysis of Algorithms Chapter VII Selected Topics T. H. Cormen, C. E. Leiserson, R. L. Rivest, C. Stein. Introduction to algorithms, 3 rd

More information

Mergesort again. 1. Split the list into two equal parts

Mergesort again. 1. Split the list into two equal parts Quicksort Mergesort again 1. Split the list into two equal parts 5 3 9 2 8 7 3 2 1 4 5 3 9 2 8 7 3 2 1 4 Mergesort again 2. Recursively mergesort the two parts 5 3 9 2 8 7 3 2 1 4 2 3 5 8 9 1 2 3 4 7 Mergesort

More information

Divide-and-Conquer. The most-well known algorithm design strategy: smaller instances. combining these solutions

Divide-and-Conquer. The most-well known algorithm design strategy: smaller instances. combining these solutions Divide-and-Conquer The most-well known algorithm design strategy: 1. Divide instance of problem into two or more smaller instances 2. Solve smaller instances recursively 3. Obtain solution to original

More information

Parallel Programming with Cilk

Parallel Programming with Cilk Project 4-1 Parallel Programming with Cilk Out: Friday, 23 Oct 2009 Due: 10AM on Wednesday, 4 Nov 2009 In this project you will learn how to parallel program with Cilk. In the first part of the project,

More information

Cilk Plus: Multicore extensions for C and C++

Cilk Plus: Multicore extensions for C and C++ Cilk Plus: Multicore extensions for C and C++ Matteo Frigo 1 June 6, 2011 1 Some slides courtesy of Prof. Charles E. Leiserson of MIT. Intel R Cilk TM Plus What is it? C/C++ language extensions supporting

More information

Divide and Conquer 4-0

Divide and Conquer 4-0 Divide and Conquer 4-0 Divide-and-Conquer The most-well known algorithm design strategy: 1. Divide instance of problem into two or more smaller instances 2. Solve smaller instances recursively 3. Obtain

More information

CSE 3101: Introduction to the Design and Analysis of Algorithms. Office hours: Wed 4-6 pm (CSEB 3043), or by appointment.

CSE 3101: Introduction to the Design and Analysis of Algorithms. Office hours: Wed 4-6 pm (CSEB 3043), or by appointment. CSE 3101: Introduction to the Design and Analysis of Algorithms Instructor: Suprakash Datta (datta[at]cse.yorku.ca) ext 77875 Lectures: Tues, BC 215, 7 10 PM Office hours: Wed 4-6 pm (CSEB 3043), or by

More information

The DAG Model; Analysis of For-Loops; Reduction

The DAG Model; Analysis of For-Loops; Reduction CSE341T 09/06/2017 Lecture 3 The DAG Model; Analysis of For-Loops; Reduction We will now formalize the DAG model. We will also see how parallel for loops are implemented and what are reductions. 1 The

More information

11. Divide and Conquer

11. Divide and Conquer The whole is of necessity prior to the part. Aristotle 11. Divide and Conquer 11.1. Divide and Conquer Many recursive algorithms follow the divide and conquer philosophy. They divide the problem into smaller

More information

Lecture 8: Mergesort / Quicksort Steven Skiena

Lecture 8: Mergesort / Quicksort Steven Skiena Lecture 8: Mergesort / Quicksort Steven Skiena Department of Computer Science State University of New York Stony Brook, NY 11794 4400 http://www.cs.stonybrook.edu/ skiena Problem of the Day Give an efficient

More information

Divide-and-Conquer. Dr. Yingwu Zhu

Divide-and-Conquer. Dr. Yingwu Zhu Divide-and-Conquer Dr. Yingwu Zhu Divide-and-Conquer The most-well known algorithm design technique: 1. Divide instance of problem into two or more smaller instances 2. Solve smaller instances independently

More information

Performance Optimization Part 1: Work Distribution and Scheduling

Performance Optimization Part 1: Work Distribution and Scheduling Lecture 5: Performance Optimization Part 1: Work Distribution and Scheduling Parallel Computer Architecture and Programming CMU 15-418/15-618, Fall 2017 Programming for high performance Optimizing the

More information

Taking Stock. IE170: Algorithms in Systems Engineering: Lecture 5. The Towers of Hanoi. Divide and Conquer

Taking Stock. IE170: Algorithms in Systems Engineering: Lecture 5. The Towers of Hanoi. Divide and Conquer Taking Stock IE170: Algorithms in Systems Engineering: Lecture 5 Jeff Linderoth Department of Industrial and Systems Engineering Lehigh University January 24, 2007 Last Time In-Place, Out-of-Place Count

More information

CILK/CILK++ AND REDUCERS YUNMING ZHANG RICE UNIVERSITY

CILK/CILK++ AND REDUCERS YUNMING ZHANG RICE UNIVERSITY CILK/CILK++ AND REDUCERS YUNMING ZHANG RICE UNIVERSITY 1 OUTLINE CILK and CILK++ Language Features and Usages Work stealing runtime CILK++ Reducers Conclusions 2 IDEALIZED SHARED MEMORY ARCHITECTURE Hardware

More information

A Minicourse on Dynamic Multithreaded Algorithms

A Minicourse on Dynamic Multithreaded Algorithms Introduction to Algorithms December 5, 005 Massachusetts Institute of Technology 6.046J/18.410J Professors Erik D. Demaine and Charles E. Leiserson Handout 9 A Minicourse on Dynamic Multithreaded Algorithms

More information

4. Sorting and Order-Statistics

4. Sorting and Order-Statistics 4. Sorting and Order-Statistics 4. Sorting and Order-Statistics The sorting problem consists in the following : Input : a sequence of n elements (a 1, a 2,..., a n ). Output : a permutation (a 1, a 2,...,

More information

Reducers and other Cilk++ hyperobjects

Reducers and other Cilk++ hyperobjects Reducers and other Cilk++ hyperobjects Matteo Frigo (Intel) ablo Halpern (Intel) Charles E. Leiserson (MIT) Stephen Lewin-Berlin (Intel) August 11, 2009 Collision detection Assembly: Represented as a tree

More information

Today: Amortized Analysis (examples) Multithreaded Algs.

Today: Amortized Analysis (examples) Multithreaded Algs. Today: Amortized Analysis (examples) Multithreaded Algs. COSC 581, Algorithms March 11, 2014 Many of these slides are adapted from several online sources Reading Assignments Today s class: Chapter 17 (Amortized

More information

A Quick Introduction To The Intel Cilk Plus Runtime

A Quick Introduction To The Intel Cilk Plus Runtime A Quick Introduction To The Intel Cilk Plus Runtime 6.S898: Advanced Performance Engineering for Multicore Applications March 8, 2017 Adapted from slides by Charles E. Leiserson, Saman P. Amarasinghe,

More information

Performance Optimization Part 1: Work Distribution and Scheduling

Performance Optimization Part 1: Work Distribution and Scheduling ( how to be l33t ) Lecture 6: Performance Optimization Part 1: Work Distribution and Scheduling Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2017 Tunes Kaleo Way Down We Go

More information

Practical Session #11 - Sort properties, QuickSort algorithm, Selection

Practical Session #11 - Sort properties, QuickSort algorithm, Selection Practical Session #11 - Sort properties, QuickSort algorithm, Selection Quicksort quicksort( A, low, high ) if( high > low ) pivot partition( A, low, high ) // quicksort( A, low, pivot-1 ) quicksort( A,

More information

CS Divide and Conquer

CS Divide and Conquer CS483-07 Divide and Conquer Instructor: Fei Li Room 443 ST II Office hours: Tue. & Thur. 1:30pm - 2:30pm or by appointments lifei@cs.gmu.edu with subject: CS483 http://www.cs.gmu.edu/ lifei/teaching/cs483_fall07/

More information

Quicksort (Weiss chapter 8.6)

Quicksort (Weiss chapter 8.6) Quicksort (Weiss chapter 8.6) Recap of before Easter We saw a load of sorting algorithms, including mergesort To mergesort a list: Split the list into two halves Recursively mergesort the two halves Merge

More information

The divide-and-conquer paradigm involves three steps at each level of the recursion: Divide the problem into a number of subproblems.

The divide-and-conquer paradigm involves three steps at each level of the recursion: Divide the problem into a number of subproblems. 2.3 Designing algorithms There are many ways to design algorithms. Insertion sort uses an incremental approach: having sorted the subarray A[1 j - 1], we insert the single element A[j] into its proper

More information

Algorithms: Efficiency, Analysis, techniques for solving problems using computer approach or methodology but not programming

Algorithms: Efficiency, Analysis, techniques for solving problems using computer approach or methodology but not programming Chapter 1 Algorithms: Efficiency, Analysis, and dod Order Preface techniques for solving problems using computer approach or methodology but not programming language e.g., search Collie Collean in phone

More information

Algorithms. MIS, KUAS, 2007 Ding

Algorithms. MIS, KUAS, 2007 Ding Algorithms MIS, KUAS, 2007 Jen-Wen Ding 1 2 2 What is Divide-and-Conquer? Divide-and-Conquer divides an instance of a problem into two or more smaller instances. The smaller instances are usually instances

More information

CS 137 Part 8. Merge Sort, Quick Sort, Binary Search. November 20th, 2017

CS 137 Part 8. Merge Sort, Quick Sort, Binary Search. November 20th, 2017 CS 137 Part 8 Merge Sort, Quick Sort, Binary Search November 20th, 2017 This Week We re going to see two more complicated sorting algorithms that will be our first introduction to O(n log n) sorting algorithms.

More information

Divide-and-Conquer CSE 680

Divide-and-Conquer CSE 680 Divide-and-Conquer CSE 680 1 Introduction Given an instance x of a problem, the divide-and-conquer method works as follows. function DAQ(x) if x is sufficiently small or simple then solve it directly else

More information

Sorting Algorithms. + Analysis of the Sorting Algorithms

Sorting Algorithms. + Analysis of the Sorting Algorithms Sorting Algorithms + Analysis of the Sorting Algorithms Insertion Sort What if first k elements of array are already sorted? 4, 7, 12, 5, 19, 16 We can shift the tail of the sorted elements list down and

More information

Compsci 590.3: Introduction to Parallel Computing

Compsci 590.3: Introduction to Parallel Computing Compsci 590.3: Introduction to Parallel Computing Alvin R. Lebeck Slides based on this from the University of Oregon Admin Logistics Homework #3 Use script Project Proposals Document: see web site» Due

More information

Data Structures and Algorithms Running time, Divide and Conquer January 25, 2018

Data Structures and Algorithms Running time, Divide and Conquer January 25, 2018 Data Structures and Algorithms Running time, Divide and Conquer January 5, 018 Consider the problem of computing n for any non-negative integer n. similar looking algorithms to solve this problem. Below

More information

Case Study: Matrix Multiplication. 6.S898: Advanced Performance Engineering for Multicore Applications February 22, 2017

Case Study: Matrix Multiplication. 6.S898: Advanced Performance Engineering for Multicore Applications February 22, 2017 Case Study: Matrix Multiplication 6.S898: Advanced Performance Engineering for Multicore Applications February 22, 2017 1 4k-by-4k Matrix Multiplication Version Implementation Running time (s) GFLOPS Absolute

More information

Parallel Algorithms. 6/12/2012 6:42 PM gdeepak.com 1

Parallel Algorithms. 6/12/2012 6:42 PM gdeepak.com 1 Parallel Algorithms 1 Deliverables Parallel Programming Paradigm Matrix Multiplication Merge Sort 2 Introduction It involves using the power of multiple processors in parallel to increase the performance

More information

Selection (deterministic & randomized): finding the median in linear time

Selection (deterministic & randomized): finding the median in linear time Lecture 4 Selection (deterministic & randomized): finding the median in linear time 4.1 Overview Given an unsorted array, how quickly can one find the median element? Can one do it more quickly than bysorting?

More information

Quick Sort. CSE Data Structures May 15, 2002

Quick Sort. CSE Data Structures May 15, 2002 Quick Sort CSE 373 - Data Structures May 15, 2002 Readings and References Reading Section 7.7, Data Structures and Algorithm Analysis in C, Weiss Other References C LR 15-May-02 CSE 373 - Data Structures

More information

CIS 121 Data Structures and Algorithms with Java Spring Code Snippets and Recurrences Monday, January 29/Tuesday, January 30

CIS 121 Data Structures and Algorithms with Java Spring Code Snippets and Recurrences Monday, January 29/Tuesday, January 30 CIS 11 Data Structures and Algorithms with Java Spring 018 Code Snippets and Recurrences Monday, January 9/Tuesday, January 30 Learning Goals Practice solving recurrences and proving asymptotic bounds

More information

CPSC 320 Midterm 2 Thursday March 13th, 2014

CPSC 320 Midterm 2 Thursday March 13th, 2014 CPSC 320 Midterm 2 Thursday March 13th, 2014 [12] 1. Answer each question with True or False, and then justify your answer briefly. [2] (a) The Master theorem can be applied to the recurrence relation

More information

Simple Sorting Algorithms

Simple Sorting Algorithms Simple Sorting Algorithms Outline We are going to look at three simple sorting techniques: Bubble Sort, Selection Sort, and Insertion Sort We are going to develop the notion of a loop invariant We will

More information

Data Structure Lecture#17: Internal Sorting 2 (Chapter 7) U Kang Seoul National University

Data Structure Lecture#17: Internal Sorting 2 (Chapter 7) U Kang Seoul National University Data Structure Lecture#17: Internal Sorting 2 (Chapter 7) U Kang Seoul National University U Kang 1 In This Lecture Main ideas and analysis of Merge sort Main ideas and analysis of Quicksort U Kang 2 Merge

More information

Computer Science & Engineering 423/823 Design and Analysis of Algorithms

Computer Science & Engineering 423/823 Design and Analysis of Algorithms Computer Science & Engineering 423/823 Design and Analysis of s Lecture 01 Medians and s (Chapter 9) Stephen Scott (Adapted from Vinodchandran N. Variyam) 1 / 24 Spring 2010 Given an array A of n distinct

More information

We can use a max-heap to sort data.

We can use a max-heap to sort data. Sorting 7B N log N Sorts 1 Heap Sort We can use a max-heap to sort data. Convert an array to a max-heap. Remove the root from the heap and store it in its proper position in the same array. Repeat until

More information

Divide-and-Conquer. Combine solutions to sub-problems into overall solution. Break up problem of size n into two equal parts of size!n.

Divide-and-Conquer. Combine solutions to sub-problems into overall solution. Break up problem of size n into two equal parts of size!n. Chapter 5 Divide and Conquer Slides by Kevin Wayne. Copyright 25 Pearson-Addon Wesley. All rights reserved. Divide-and-Conquer Divide-and-conquer. Break up problem into several parts. Solve each part recursively.

More information

Divide and Conquer Algorithms

Divide and Conquer Algorithms CSE341T 09/13/2017 Lecture 5 Divide and Conquer Algorithms We have already seen a couple of divide and conquer algorithms in this lecture. The reduce algorithm and the algorithm to copy elements of the

More information

Randomized Algorithms, Quicksort and Randomized Selection

Randomized Algorithms, Quicksort and Randomized Selection CMPS 2200 Fall 2017 Randomized Algorithms, Quicksort and Randomized Selection Carola Wenk Slides by Carola Wenk and Charles Leiserson CMPS 2200 Intro. to Algorithms 1 Deterministic Algorithms Runtime for

More information

Sorting. Sorting. Stable Sorting. In-place Sort. Bubble Sort. Bubble Sort. Selection (Tournament) Heapsort (Smoothsort) Mergesort Quicksort Bogosort

Sorting. Sorting. Stable Sorting. In-place Sort. Bubble Sort. Bubble Sort. Selection (Tournament) Heapsort (Smoothsort) Mergesort Quicksort Bogosort Principles of Imperative Computation V. Adamchik CS 15-1 Lecture Carnegie Mellon University Sorting Sorting Sorting is ordering a list of objects. comparison non-comparison Hoare Knuth Bubble (Shell, Gnome)

More information

SORTING AND SELECTION

SORTING AND SELECTION 2 < > 1 4 8 6 = 9 CHAPTER 12 SORTING AND SELECTION ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN JAVA, GOODRICH, TAMASSIA AND GOLDWASSER (WILEY 2016)

More information

CS 360 Exam 1 Fall 2014 Name. 1. Answer the following questions about each code fragment below. [8 points]

CS 360 Exam 1 Fall 2014 Name. 1. Answer the following questions about each code fragment below. [8 points] CS 360 Exam 1 Fall 2014 Name 1. Answer the following questions about each code fragment below. [8 points] for (v=1; v

More information

Practice Problems for the Final

Practice Problems for the Final ECE-250 Algorithms and Data Structures (Winter 2012) Practice Problems for the Final Disclaimer: Please do keep in mind that this problem set does not reflect the exact topics or the fractions of each

More information

University of Waterloo Department of Electrical and Computer Engineering ECE 250 Data Structures and Algorithms. Final Examination T09:00

University of Waterloo Department of Electrical and Computer Engineering ECE 250 Data Structures and Algorithms. Final Examination T09:00 University of Waterloo Department of Electrical and Computer Engineering ECE 250 Data Structures and Algorithms Instructor: Douglas Wilhelm Harder Time: 2.5 hours Aides: none 18 pages Final Examination

More information

CS S-11 Sorting in Θ(nlgn) 1. Base Case: A list of length 1 or length 0 is already sorted. Recursive Case:

CS S-11 Sorting in Θ(nlgn) 1. Base Case: A list of length 1 or length 0 is already sorted. Recursive Case: CS245-2015S-11 Sorting in Θ(nlgn) 1 11-0: Merge Sort Recursive Sorting Base Case: A list of length 1 or length 0 is already sorted Recursive Case: Split the list in half Recursively sort two halves Merge

More information

COMP2012H Spring 2014 Dekai Wu. Sorting. (more on sorting algorithms: mergesort, quicksort, heapsort)

COMP2012H Spring 2014 Dekai Wu. Sorting. (more on sorting algorithms: mergesort, quicksort, heapsort) COMP2012H Spring 2014 Dekai Wu Sorting (more on sorting algorithms: mergesort, quicksort, heapsort) Merge Sort Recursive sorting strategy. Let s look at merge(.. ) first. COMP2012H (Sorting) 2 COMP2012H

More information

Divide and Conquer. Design and Analysis of Algorithms Andrei Bulatov

Divide and Conquer. Design and Analysis of Algorithms Andrei Bulatov Divide and Conquer Design and Analysis of Algorithms Andrei Bulatov Algorithms Divide and Conquer 4-2 Divide and Conquer, MergeSort Recursive algorithms: Call themselves on subproblem Divide and Conquer

More information

Math 501 Solutions to Homework Assignment 10

Math 501 Solutions to Homework Assignment 10 Department of Mathematics Discrete Mathematics Math 501 Solutions to Homework Assignment 10 Section 7.3 Additional Exercises 1. int CountValuesLessThanT(int a[], int n, int T) { int count = 0; for (int

More information

Computer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Divide and Conquer

Computer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Divide and Conquer Computer Science 385 Analysis of Algorithms Siena College Spring 2011 Topic Notes: Divide and Conquer Divide and-conquer is a very common and very powerful algorithm design technique. The general idea:

More information

The Cilkview Scalability Analyzer

The Cilkview Scalability Analyzer The Cilkview Scalability Analyzer Yuxiong He Charles E. Leiserson William M. Leiserson Intel Corporation Nashua, New Hampshire ABSTRACT The Cilkview scalability analyzer is a software tool for profiling,

More information

Chapter 2 Divid d e-an -Conquer 1

Chapter 2 Divid d e-an -Conquer 1 Chapter 2 Divide-and-Conquerid d 1 A top-down approach Patterned after the strategy employed by Napoleon Divide an instance of a problem recursively into two or more smaller instances until the solutions

More information

Unit Outline. Comparing Sorting Algorithms Heapsort Mergesort Quicksort More Comparisons Complexity of Sorting 2 / 33

Unit Outline. Comparing Sorting Algorithms Heapsort Mergesort Quicksort More Comparisons Complexity of Sorting 2 / 33 Unit #4: Sorting CPSC : Basic Algorithms and Data Structures Anthony Estey, Ed Knorr, and Mehrdad Oveisi 0W Unit Outline Comparing Sorting Algorithms Heapsort Mergesort Quicksort More Comparisons Complexity

More information

CS Divide and Conquer

CS Divide and Conquer CS483-07 Divide and Conquer Instructor: Fei Li Room 443 ST II Office hours: Tue. & Thur. 1:30pm - 2:30pm or by appointments lifei@cs.gmu.edu with subject: CS483 http://www.cs.gmu.edu/ lifei/teaching/cs483_fall07/

More information

Data Structures and Algorithms CSE 465

Data Structures and Algorithms CSE 465 Data Structures and Algorithms CSE 465 LECTURE 4 More Divide and Conquer Binary Search Exponentiation Multiplication Sofya Raskhodnikova and Adam Smith Review questions How long does Merge Sort take on

More information

COMP 3403 Algorithm Analysis Part 2 Chapters 4 5. Jim Diamond CAR 409 Jodrey School of Computer Science Acadia University

COMP 3403 Algorithm Analysis Part 2 Chapters 4 5. Jim Diamond CAR 409 Jodrey School of Computer Science Acadia University COMP 3403 Algorithm Analysis Part 2 Chapters 4 5 Jim Diamond CAR 409 Jodrey School of Computer Science Acadia University Chapter 4 Decrease and Conquer Chapter 4 43 Chapter 4: Decrease and Conquer Idea:

More information

CS483 Analysis of Algorithms Lecture 03 Divide-n-Conquer

CS483 Analysis of Algorithms Lecture 03 Divide-n-Conquer CS483 Analysis of Algorithms Lecture 03 Divide-n-Conquer Jyh-Ming Lien February 05, 2009 this lecture note is based on Algorithms by S. Dasgupta, C.H. Papadimitriou, and U.V. Vazirani and to the Design

More information

Introduction to Algorithms

Introduction to Algorithms Thomas H. Cormen Charles E. Leiserson Ronald L. Rivest Clifford Stein Introduction to Algorithms Third Edition The MIT Press Cambridge, Massachusetts London, England 27 Multithreaded Algorithms The vast

More information

e-pg PATHSHALA- Computer Science Design and Analysis of Algorithms Module 10

e-pg PATHSHALA- Computer Science Design and Analysis of Algorithms Module 10 e-pg PATHSHALA- Computer Science Design and Analysis of Algorithms Module 10 Component-I (B) Description of Module Items Subject Name Computer Science Description of Module Paper Name Module Name/Title

More information

Plotting run-time graphically. Plotting run-time graphically. CS241 Algorithmics - week 1 review. Prefix Averages - Algorithm #1

Plotting run-time graphically. Plotting run-time graphically. CS241 Algorithmics - week 1 review. Prefix Averages - Algorithm #1 CS241 - week 1 review Special classes of algorithms: logarithmic: O(log n) linear: O(n) quadratic: O(n 2 ) polynomial: O(n k ), k 1 exponential: O(a n ), a > 1 Classifying algorithms is generally done

More information