Divide and Conquer Algorithms

Similar documents
Scan and Quicksort. 1 Scan. 1.1 Contraction CSE341T 09/20/2017. Lecture 7

Scan and its Uses. 1 Scan. 1.1 Contraction CSE341T/CSE549T 09/17/2014. Lecture 8

Writing Parallel Programs; Cost Model.

PRAM Divide and Conquer Algorithms

Lecture 6 Sequences II. Parallel and Sequential Data Structures and Algorithms, (Fall 2013) Lectured by Danny Sleator 12 September 2013

CS473 - Algorithms I

MergeSort, Recurrences, Asymptotic Analysis Scribe: Michael P. Kim Date: September 28, 2016 Edited by Ofir Geri

MergeSort, Recurrences, Asymptotic Analysis Scribe: Michael P. Kim Date: April 1, 2015

The divide and conquer strategy has three basic parts. For a given problem of size n,

Scribe: Sam Keller (2015), Seth Hildick-Smith (2016), G. Valiant (2017) Date: January 25, 2017

CS302 Topic: Algorithm Analysis #2. Thursday, Sept. 21, 2006

Parallel Algorithms CSE /22/2015. Outline of this lecture: 1 Implementation of cilk for. 2. Parallel Matrix Multiplication

CS 506, Sect 002 Homework 5 Dr. David Nassimi Foundations of CS Due: Week 11, Mon. Apr. 7 Spring 2014

CIS 121 Data Structures and Algorithms with Java Spring Code Snippets and Recurrences Monday, January 29/Tuesday, January 30

Lecturers: Sanjam Garg and Prasad Raghavendra March 20, Midterm 2 Solutions

1 Introduction. 2 InsertionSort. 2.1 Correctness of InsertionSort

CS302 Topic: Algorithm Analysis. Thursday, Sept. 22, 2005

CSC Design and Analysis of Algorithms

Lecture 15 : Review DRAFT

CSC Design and Analysis of Algorithms. Lecture 6. Divide and Conquer Algorithm Design Technique. Divide-and-Conquer

The DAG Model; Analysis of For-Loops; Reduction

(Refer Slide Time: 01.26)

5.1 The String reconstruction problem

CS583 Lecture 01. Jana Kosecka. some materials here are based on Profs. E. Demaine, D. Luebke A.Shehu, J-M. Lien and Prof. Wang s past lecture notes

Data Structures and Algorithms CSE 465

Cilk, Matrix Multiplication, and Sorting

Computer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Divide and Conquer

Plotting run-time graphically. Plotting run-time graphically. CS241 Algorithmics - week 1 review. Prefix Averages - Algorithm #1

Problem. Input: An array A = (A[1],..., A[n]) with length n. Output: a permutation A of A, that is sorted: A [i] A [j] for all. 1 i j n.

CSC 505, Spring 2005 Week 6 Lectures page 1 of 9

The divide-and-conquer paradigm involves three steps at each level of the recursion: Divide the problem into a number of subproblems.

Today: Matrix Subarray (Divide & Conquer) Intro to Dynamic Programming (Rod cutting) COSC 581, Algorithms January 21, 2014

Chapter 4. Divide-and-Conquer. Copyright 2007 Pearson Addison-Wesley. All rights reserved.

Divide-and-Conquer. The most-well known algorithm design strategy: smaller instances. combining these solutions

Data Structures and Algorithms Running time, Divide and Conquer January 25, 2018

Lecture 5: Sorting Part A

COE428 Lecture Notes Week 1 (Week of January 9, 2017)

Lecture 2: Getting Started

Fundamental mathematical techniques reviewed: Mathematical induction Recursion. Typically taught in courses such as Calculus and Discrete Mathematics.

Parallel Breadth First Search

Design and Analysis of Algorithms

Computer Science & Engineering 423/823 Design and Analysis of Algorithms

Sorting is a problem for which we can prove a non-trivial lower bound.

Algorithms and Data Structures

Merge Sort. Run time typically depends on: Insertion sort can be faster than merge sort. Fast, able to handle any data

COMP Parallel Computing. PRAM (2) PRAM algorithm design techniques

Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute

CPSC 320 Midterm 2 Thursday March 13th, 2014

CS 303 Design and Analysis of Algorithms

CS584/684 Algorithm Analysis and Design Spring 2017 Week 3: Multithreaded Algorithms

Algorithms - Ch2 Sorting

Multithreaded Algorithms Part 2. Dept. of Computer Science & Eng University of Moratuwa

Solutions to Exam Data structures (X and NV)

Divide-and-Conquer. Dr. Yingwu Zhu

Quicksort (CLRS 7) We saw the divide-and-conquer technique at work resulting in Mergesort. Mergesort summary:

COMP 250 Fall Midterm examination

Scientific Computing. Algorithm Analysis

Multithreaded Programming in. Cilk LECTURE 2. Charles E. Leiserson

CSC236 Week 6. Larry Zhang

Divide-and-Conquer. Divide-and conquer is a general algorithm design paradigm:

Recurrences and Divide-and-Conquer

Merge Sort. Yufei Tao. Department of Computer Science and Engineering Chinese University of Hong Kong

Jana Kosecka. Linear Time Sorting, Median, Order Statistics. Many slides here are based on E. Demaine, D. Luebke slides

Graph Contraction. Graph Contraction CSE341T/CSE549T 10/20/2014. Lecture 14

Lecture 3. Recurrences / Heapsort

Algorithms and Complexity

Framework for Design of Dynamic Programming Algorithms

Parallel and Sequential Data Structures and Algorithms Lecture (Spring 2012) Lecture 25 Suffix Arrays

Sorting & Growth of Functions

FINAL EXAM SOLUTIONS

Treaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19

CSE373: Data Structures and Algorithms Lecture 4: Asymptotic Analysis. Aaron Bauer Winter 2014

Subset sum problem and dynamic programming

Sankalchand Patel College of Engineering - Visnagar Department of Computer Engineering and Information Technology. Assignment

17/05/2018. Outline. Outline. Divide and Conquer. Control Abstraction for Divide &Conquer. Outline. Module 2: Divide and Conquer

Lecture Notes on Quicksort

Lecture 6: Divide-and-Conquer

INDIAN STATISTICAL INSTITUTE

CS125 : Introduction to Computer Science. Lecture Notes #40 Advanced Sorting Analysis. c 2005, 2004 Jason Zych

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Dynamic Programming I Date: 10/6/16

CS Data Structures and Algorithm Analysis

CSE 101. Algorithm Design and Analysis Miles Jones Office 4208 CSE Building Lecture 17: Divide and Conquer Design examples.

1 Probabilistic analysis and randomized algorithms

CS473-Algorithms I. Lecture 10. Dynamic Programming. Cevdet Aykanat - Bilkent University Computer Engineering Department

Analysis of Algorithms - Quicksort -

CS Divide and Conquer

CS 140 : Non-numerical Examples with Cilk++

CS 4349 Lecture August 21st, 2017

Dynamic Programming Algorithms

CSC Design and Analysis of Algorithms. Lecture 7. Transform and Conquer I Algorithm Design Technique. Transform and Conquer

Sorting. There exist sorting algorithms which have shown to be more efficient in practice.

DIVIDE & CONQUER. Problem of size n. Solution to sub problem 1

Dijkstra s algorithm for shortest paths when no edges have negative weight.

Divide and Conquer. Algorithm Fall Semester

CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics

Introduction to Computers and Programming. Today

5. DIVIDE AND CONQUER I

Chapter 3 Dynamic programming

Lecture 8: Mergesort / Quicksort Steven Skiena

Introduction to Parallel Computing Errata

Transcription:

CSE341T 09/13/2017 Lecture 5 Divide and Conquer Algorithms We have already seen a couple of divide and conquer algorithms in this lecture. The reduce algorithm and the algorithm to copy elements of the array are both D&C algorithms. We have also seen a divide and conquer matrix multiplication algorithm. We will use this technique a lot in this class, so lets formalize it a bit. Divide and conquer algorithms generally have 3 steps: divide the problem into subproblems, recursively solve the subproblems and combine the solutions of subproblems to create the solution to the original problem. The structure of a divide-and-conquer algorithm follows the structure of a proof by (strong) induction. This makes it easy to show correctness and also to figure out cost bounds. The general structure looks as follows: Base Case: When the problem is sufficiently small, return the trivial answer directly or resort to a different, usually simpler, algorithm, which works great on small instances. Inductive Step: First, the algorithm divides the current instance I into parts, commonly referred to as subproblems, each smaller than the original problem. Then, it recurses on each of the parts to obtain answers for the parts. In proofs, this is where we assume inductively that the answers for these parts are correct, and based on this assumption, it combines the answers to produce an answer for the original instance I. This technique is even more useful for parallel algorithms. Generally, you can solve the subproblems in parallel. If the divide and combine step is inexpensive, then you are done. If either the divide and combine step (or both) is expensive, however, you may want to parallelize it (them), which can be difficult, depending on the algorithms. Let us assume that the subproblems can be solved independently. Say the problem of size n is broken into k subproblems of size n 1,..., n k. How would you write the Cilk Plus program? What if k = 2? 1

F(n) 1 if n n 0 2 then Base-Case 3 return 4 Divide into 2 parts of size n 1 and n 2 5 spawn F(n 1 ) 6 F(n 2 ) 7 sync 8 Combine. With the strategy above, the work is And the span is W (n) = W divide (n) + W (n 1 ) + W (n 2 ) + W combine (n) S(n) = S divide (n) + max {S(n 1 ), S(n 2 ) + S combine (n) Note that the work recurrence is simply adding up the work across all components. More interesting is the span recurrence: First, note that a divide and conquer algorithm has to split a problem instance into subproblems before these subproblems are recursively solved. Furthermore, it cannot combine the results from these subproblems to generate the ultimate answer until the recursive calls on the subproblems are complete. This forms a chain of sequential dependencies, explaining why we add their span together. The parallel execution takes place among the recursive calls since we assume that the subproblems can be solved independently this is why we take the max over the subproblems span. F(n) Now consider arbitrary k. What is the pseudocode? 1 if n n 0 2 then Base-Case 3 return 4 Divide into k parts of size n 1, n 2,..., n k 5 parallel for i 1 to k 6 do F(n i ) 7 Combine. W (n) = W divide (n) + k W (n i ) + W combine (n) i=1 2

And the span is S(n) = S divide (n) + + lg k + k max i=1 {S(n i + S combine (n) Applying this formula results in familiar recurrences such as W (n) = 2W (n/2) + O(n). In the rest of this lecture, we ll get to see other recurrences and learn how to derive a closed-form for them. 1 Maximum contiguous subsequence sum problem (MCSS) In most divide-and-conquer algorithms you have encountered so far, the subproblems are occurrences of the problem you are solving. This is not always the case. Often, you ll need more information from the subproblems to properly combine results of the subproblems. In this case, you ll need to strengthen the problem, much in the same way that you strengthen an inductive hypothesis when doing an inductive proof. Let s take a look at an example: the maximum contiguous subsequence sum (MCSS) problem. MCSS can be defined as follows: Definition 1 (The Maximum Contiguous Subsequence Sum (MCSS) Problem) Given a sequence of numbers s = s 1,..., s n, the maximum contiguous subsequence sum problem is to find { j max s k : 1 i n, i j n. k=i (i.e., the sum of the contiguous subsequence of s that has the largest value). For example, the MCSS of a sequence 2, 5, 4, 1, 2, 3 is is 6, via the subsequence 4, 1, 2, 3. Algorithm 1: Brute Force The brute force algorithm examines all possible combinations of subsequences and for each one of them, it computes the sum and takes the maximum. Note that every subsequence of s can be represented by a starting position i and an ending position j. We will use the shorthand s i..j to denote the subsequence s i, s i+1,..., s j. MCSSS[1..n] 1 parallel for all tuples (i, j) such that i 1 to n and j i to n 2 do A[i, j] spawn SUM(i, j) 3 MAX(A) 3

The total work is O(n 3 ). We have learned in last lecture that we can use reduction to compute SUM(i, j) in parallel, with the work of θ(n) and span of θ(lg n). Similarly, you can compute MAX using reduction with the same asymptotic work and span. That means, the total work of this Brute Force algorithm is θ(n 3 ), and the span is θ(lg n). Exercise 1 Can you improve the work of the naïve algorithm to O(n 2 )? What does this do the span? Algorithm 2: Divide And Conquer Version 1.0 We ll design a divide-and-conquer algorithm for this problem. Divide: Split the sequence in half. Conquer: Recursively solve for both halves. Combine: This is the most interesting step. For example, imagine we split the sequence in the middle and we get the following answers: L R L = {{ mcss=56 R =... {{ mcss=17 There are 3 possibilities: (1) the maximum sum lies completely in the left subproblem, (2) the maximum sum lies completely in the right subproblem, and (3) the maximum sum spans across the split point. The first two cases are easy. The more interesting case is when the largest sum goes between the two subproblems. The maximum subsequence that spans the middle is equal to the largest sum of a suffix on the left and the largest sum of a prefix on the right. MCSS(S, i, j) 1 if i = j 2 then return S[i] 3 mid i+j 2 4 L spawn MCSS(S, i, mid) 5 R spawn MCSS(S, mid + 1, j) 6 sync 7 LS spawn MAXSUFFIXSUM(S, i, mid) 8 RP MAXPREFIXSUM(S, mid + 1, j) 9 sync 10 return max(l, LS + RP, R) 4

How would you calculate the suffix and the prefix? If you do it in the naive sequential way (e.g. keep the running prefix sum, and update the best as you encounter each element), then its work and span is θ(n). Then the work of the MCSS algorithm is and the span is T 1 (n) = 2T 1 (n/2) + θ(n) = θ(n lg n) T (n) = T (n/2) + θ(n) = θ(n) This is not great, since the parallelism is only θ(lg n). Note that you don t have to sync between calculating the recursive steps and calculating the prefix and suffix. However, it will not help with the span if you didn t. We will show next week how to compute the max prefix and suffix sums in parallel, but for now, we ll take it for granted that they can be done in θ(n) work and θ(log n) span. This reduces the span to T (n) = T (n/2) + θ(lg n) = θ(lg 2 n) Exercise 2 Solve the work and span recurrences without using the master method. That is, use the recursion tree method you learned in CSE 241. Also prove the bounds using induction, as you learned in CSE 241. Algorithm 3: Divide And Conquer Version 2.0 As it turns out, we can do better than O(n log n) work. The key is to strengthen the (sub)problem i.e., solving a problem that is slightly more general -to get a faster algorithm. Looking back at our previous divide-and-conquer algorithm, the bottleneck is that the combine step takes linear work. Is there any useful information from the subproblems we could have used to make the combine step take constant work instead of linear work? In the design of our previous algorithm, we took advantage of the fact that if we know the max suffix sum and max prefix sums of the subproblems, we can produce the max subsequence sum in constant time. The expensive part was in fact computing these prefix and suffix sums we had to spend linear work because we didn t know how generate the prefix and suffix sums for the next level up without recomputing these sums. Can we easily fix this? The idea is to return the overall sum together with the max prefix and suffix sums, so we return a total of 4 values: the max subsequence sum, the max prefix sum, the max suffix sum, and the overall sum. Having this information from the subproblems is enough to produce a similar answer tuple for all levels up, in constant work and span per level. More specifically, we strengthen our problem to return a 4-tuple (mcss, max-prefix, max-suffix, total), and if the recursive calls return (m 1, p 1, s 1, t 1 ) and (m 2, p 2, s 2, t 2 ), then we return (max(s 1 + p 2, m 1, m 2 ), max(p 1, t 1 + p 2 ), max(s 1 + t 2, s 2 ), t 1 + t 2 ) 5

Here s the code for this algorithm: MCSS(A, i, j) 1 if i = j 2 then m max {A[i], 0, p max {A[i], 0, s max {A[i], 0, t A[i] 3 return (m, p, s, t) 4 mid i+j 2 5 (m 1, p 1, s 1, t 1 ) spawn MCSS(A, i, mid) 6 (m 2, p 2, s 2, t 2 ) MCSS(A, mid + 1, j) 7 sync 8 m max {s 1 + p 2, m 1, m 2, p {p 1, t 1 + p 2, s {s 1 + t 2, s 2, t t 1 + t 2 9 return (m, p, s, t) Cost Analysis. We have the recurrences W (n) = 2W (n/2) + O(1) = O(n) S(n) = S(n/2) + O(1) = O(lg n) 2 Parallel Mergesort We now talk about merge sort. Recall that in the first lecture, we saw a parallel version of merge sort here we do the obvious thing and make the recursive calls in parallel. MergeSort(A, n) 1 if n = 1 2 then return A 3 Divide A into two A left and A right each of size n/2 4 A left spawn MERGESORT(A left, n/2) 5 A right MERGESORT(A right, n/2) 6 sync 7 Merge the two halves into A 8 return A The work of the algorithm remains unchanged. What is the span? The recurrence is S MergeSort (n) = S MergeSort (n/2) + S merge (n) Since we are merging the arrays sequentially, the span of the merge call is Θ(n) and the recurrence solves to S MergeSort (n) = Θ(n). Therefore, the parallelism of this merge sort operation is Θ(lg n), 6

which is very small. In general, you want the parallelism to be polynomial in n, not logarithmic in n. What is the problem? It is the merge operation doing merge sequentially is the bottleneck. 3 Let s Parallelize the Merge in Mergesort In Mergesort, we generally merge two arrays of the same size. However, in order to get this parallel merge to work, we have to be more general. We must learn how to merge two arrays which can be different in size. Problem Statement: Given two arrays B[1..m] and C[1..l], each of which is sorted, we want to merge them into a sorted array A[1..n] where n = m + l. Without loss of generality say that m > l. Here s the procedure. ParallelMerge(B, m, C, l) 1 if m < l 2 then return MERGE(C, l, B, m) 3 if m = 1, 4 then Concatenate the arrays in the right order and return. 5 mid m/2 6 s SEARCH(C, B[mid]). 7 A left spawn MERGE(B[1..mid], mid, C[1..s], s) 8 A right spawn MERGE(B[mid + 1..m], m mid, C[s + 1..l], l s) 9 sync 10 Concatenate A left and A right and return Let us calculate work and span. The search takes Θ(lg n) work and span. Say k = mid + s. First, we notice that k = mid + s = m/2 + s m/2 + 0 n/4 = n/4 Since k and n k are symmetric, we have n/4 k 3n/4. W Merge (n) = W Merge (k) + W Merge (n k) + Θ(lg n) = W Merge (αn) + W Merge ((1 α)n) + Θ(lg n) For some 1/4 α 3/4 = Θ(n) 7

Exercise 3 Draw the recursion tree for the above recurrence to solve it. Exercise 4 Show using induction that the recurrence W (n) = W (αn) + W ((1 α)n) + Θ(lg n) solves to Θ(n). For span, we have: S Merge (n) = max{s Merge (k), S Merge (n k) + Θ(lg n) S Merge (3n/4) + Θ(lg n) = Θ(lg 2 n) Note that parallelizing the Merge procedure did not increase its work which is exactly what we want. It is a work-efficient algorithm. But we reduced the span from Θ(n) to Θ(lg 2 n). Work and Span of Parallel Merge Sort using Parallel Merge We can use this parallel merge procedure as a subroutine of merge sort and our work remains Θ(n lg n). If we substitute the span of merge back into the Merge Sort equation, we get S MergeSort (n) = S MergeSort (n/2) + S Merge (n) = S MergeSort (n/2) + Θ(lg 2 n) = Θ(lg 3 n) Therefore, the parallelism of this new merge sort procedure is Θ(n lg n/ lg 3 n) = Θ(n/ lg 2 n). Therefore, we now have polynomial amount of parallelism. 8