The divide and conquer strategy has three basic parts. For a given problem of size n,

Similar documents
EECS 2011M: Fundamentals of Data Structures

Outline. CS 561, Lecture 6. Priority Queues. Applications of Priority Queue. For NASA, space is still a high priority, Dan Quayle

Deterministic and Randomized Quicksort. Andreas Klappenecker

The divide-and-conquer paradigm involves three steps at each level of the recursion: Divide the problem into a number of subproblems.

CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics

CS 561, Lecture 1. Jared Saia University of New Mexico

Lecture 5: Sorting Part A

Algorithms - Ch2 Sorting

Problem. Input: An array A = (A[1],..., A[n]) with length n. Output: a permutation A of A, that is sorted: A [i] A [j] for all. 1 i j n.

Analyze the obvious algorithm, 5 points Here is the most obvious algorithm for this problem: (LastLargerElement[A[1..n]:

Plotting run-time graphically. Plotting run-time graphically. CS241 Algorithmics - week 1 review. Prefix Averages - Algorithm #1

e-pg PATHSHALA- Computer Science Design and Analysis of Algorithms Module 10

CSE 3101: Introduction to the Design and Analysis of Algorithms. Office hours: Wed 4-6 pm (CSEB 3043), or by appointment.

Lecture 9: Sorting Algorithms

Algorithms and Data Structures (INF1) Lecture 7/15 Hua Lu

Divide and Conquer. Algorithm D-and-C(n: input size)

Object-oriented programming. and data-structures CS/ENGRD 2110 SUMMER 2018

Sorting is a problem for which we can prove a non-trivial lower bound.

1. Covered basics of a simple design technique (Divideand-conquer) 2. Next, more sorting algorithms.

CS 303 Design and Analysis of Algorithms

CS 506, Sect 002 Homework 5 Dr. David Nassimi Foundations of CS Due: Week 11, Mon. Apr. 7 Spring 2014

Computer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Divide and Conquer

QuickSort and Others. Data Structures and Algorithms Andrei Bulatov

Chapter 4. Divide-and-Conquer. Copyright 2007 Pearson Addison-Wesley. All rights reserved.

Next. 1. Covered basics of a simple design technique (Divideand-conquer) 2. Next, more sorting algorithms.

17/05/2018. Outline. Outline. Divide and Conquer. Control Abstraction for Divide &Conquer. Outline. Module 2: Divide and Conquer

Analysis of Algorithms - Quicksort -

Computer Science & Engineering 423/823 Design and Analysis of Algorithms

CSC Design and Analysis of Algorithms

Fundamental mathematical techniques reviewed: Mathematical induction Recursion. Typically taught in courses such as Calculus and Discrete Mathematics.

Comparison Sorts. Chapter 9.4, 12.1, 12.2

CSC Design and Analysis of Algorithms. Lecture 6. Divide and Conquer Algorithm Design Technique. Divide-and-Conquer

Divide and Conquer 4-0

MergeSort, Recurrences, Asymptotic Analysis Scribe: Michael P. Kim Date: September 28, 2016 Edited by Ofir Geri

Lecture 2: Getting Started

4. Sorting and Order-Statistics

Multiple-choice (35 pt.)

PRAM Divide and Conquer Algorithms

Sorting. Sorting in Arrays. SelectionSort. SelectionSort. Binary search works great, but how do we create a sorted array in the first place?

Lecture 15 : Review DRAFT

Searching, Sorting. part 1

DIVIDE & CONQUER. Problem of size n. Solution to sub problem 1

Sorting: Given a list A with n elements possessing a total order, return a list with the same elements in non-decreasing order.

Divide-and-Conquer. The most-well known algorithm design strategy: smaller instances. combining these solutions

Lecture Notes on Quicksort

Lecture Notes on Quicksort

COMP250: Loop invariants

Divide & Conquer. 2. Conquer the sub-problems by solving them recursively. 1. Divide the problem into number of sub-problems

Biostatistics 615/815 Lecture 5: Divide and Conquer Algorithms Sorting Algorithms

Recurrences and Divide-and-Conquer

7. Sorting I. 7.1 Simple Sorting. Problem. Algorithm: IsSorted(A) 1 i j n. Simple Sorting

Divide & Conquer Design Technique

Selection (deterministic & randomized): finding the median in linear time

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

Week 10. Sorting. 1 Binary heaps. 2 Heapification. 3 Building a heap 4 HEAP-SORT. 5 Priority queues 6 QUICK-SORT. 7 Analysing QUICK-SORT.

Quick Sort. CSE Data Structures May 15, 2002

Quicksort. Repeat the process recursively for the left- and rightsub-blocks.

Lecture #2. 1 Overview. 2 Worst-Case Analysis vs. Average Case Analysis. 3 Divide-and-Conquer Design Paradigm. 4 Quicksort. 4.

Back to Sorting More efficient sorting algorithms

Week 10. Sorting. 1 Binary heaps. 2 Heapification. 3 Building a heap 4 HEAP-SORT. 5 Priority queues 6 QUICK-SORT. 7 Analysing QUICK-SORT.

Divide and Conquer Algorithms

Lecture 7 Quicksort : Principles of Imperative Computation (Spring 2018) Frank Pfenning

Analysis of Algorithms - Medians and order statistic -

Quicksort (CLRS 7) We saw the divide-and-conquer technique at work resulting in Mergesort. Mergesort summary:

Lecture 19 Sorting Goodrich, Tamassia

MergeSort, Recurrences, Asymptotic Analysis Scribe: Michael P. Kim Date: April 1, 2015

Outline. Introduction. 2 Proof of Correctness. 3 Final Notes. Precondition P 1 : Inputs include

Practical Session #11 - Sort properties, QuickSort algorithm, Selection

Divide-and-Conquer. Dr. Yingwu Zhu

How many leaves on the decision tree? There are n! leaves, because every permutation appears at least once.

Scribe: Sam Keller (2015), Seth Hildick-Smith (2016), G. Valiant (2017) Date: January 25, 2017

SORTING AND SELECTION

Divide and Conquer. Design and Analysis of Algorithms Andrei Bulatov

Advanced Algorithms and Data Structures

Sorting. There exist sorting algorithms which have shown to be more efficient in practice.

Practical Session 10 - Huffman code, Sort properties, QuickSort algorithm, Selection

Sorting and Selection

Divide-and-Conquer CSE 680

Design and Analysis of Algorithms

IS 709/809: Computational Methods in IS Research. Algorithm Analysis (Sorting)

Jana Kosecka. Linear Time Sorting, Median, Order Statistics. Many slides here are based on E. Demaine, D. Luebke slides

Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, Merge Sort & Quick Sort

CIS 121 Data Structures and Algorithms with Java Spring Code Snippets and Recurrences Monday, January 29/Tuesday, January 30

COMP Data Structures

CSE373: Data Structure & Algorithms Lecture 21: More Comparison Sorting. Aaron Bauer Winter 2014

Algorithm Efficiency & Sorting. Algorithm efficiency Big-O notation Searching algorithms Sorting algorithms

II (Sorting and) Order Statistics

Outline. Computer Science 331. Three Classical Algorithms. The Sorting Problem. Classical Sorting Algorithms. Mike Jacobson. Description Analysis

Sorting Shabsi Walfish NYU - Fundamental Algorithms Summer 2006

CSC Design and Analysis of Algorithms. Lecture 6. Divide and Conquer Algorithm Design Technique. Divide-and-Conquer

Unit 6 Chapter 15 EXAMPLES OF COMPLEXITY CALCULATION

CS S-11 Sorting in Θ(nlgn) 1. Base Case: A list of length 1 or length 0 is already sorted. Recursive Case:

8. Sorting II. 8.1 Heapsort. Heapsort. [Max-]Heap 6. Heapsort, Quicksort, Mergesort. Binary tree with the following properties Wurzel

CPSC 311 Lecture Notes. Sorting and Order Statistics (Chapters 6-9)

SAMPLE OF THE STUDY MATERIAL PART OF CHAPTER 6. Sorting Algorithms

Solutions. (a) Claim: A d-ary tree of height h has at most 1 + d +...

CS 112 Introduction to Computing II. Wayne Snyder Computer Science Department Boston University

We can use a max-heap to sort data.

CS473 - Algorithms I

CSE 332: Data Structures & Parallelism Lecture 12: Comparison Sorting. Ruth Anderson Winter 2019

Transcription:

1 Divide & Conquer One strategy for designing efficient algorithms is the divide and conquer approach, which is also called, more simply, a recursive approach. The analysis of recursive algorithms often produces mathematical equations called recurrence relations. It is the analysis of these equations that produces the bound on the algorithm running time. The divide and conquer strategy has three basic parts. For a given problem of size n, 1. divide: break a problem instance into several smaller instances of the same problem. conquer: if a smaller instance is trivial, solve it directly; otherwise, divide again 3. combine: combine the results of smaller instances together into a solution to a larger instance That is, we split a given problem into several smaller instances (usually, but sometimes more depending on the problem structure), which are easier to solve, and then combine those smaller solutions together into a solution for the original problem. The goal is to keep dividing until we get the problem down to a small enough size that we can solve it quickly (or trivially). Fundamentally, divide and conquer is a recursive approach, and most divide and conquer algorithms have the following structure: function fun(n) { if n==trivial { solve and return } else { parta = fun(n ) partb = fun(n-n ) AB = combine(a,b) return AB } } The recursive structure of divide and conquer algorithms makes it useful to model their asymptotic running time T(n) using recurrence relations, which often have the general form T(n) = at(g[n])+f(n), (1) where a is a constant denoting the number of subproblems we break a given instance into, g[n] is a function of n that describes the size of the subproblems, and f(n) is the time required to combine the smaller results / divide the problem into the smaller versions. There are several strategies for solving recurrence relations. We ll cover in this unit: the unrolling method and the master 1

f(n) f(n) f(n ) f(n n ) recursive calls f(n ) f(n n ) combine results Figure 1: Schematic of the divide & conquer strategy, in which we recursively divide a problem of size n into subproblems of size n and n n, until a trivial case is encountered (left side). The results of each pair of subproblems are combined into the solution for the larger problem. The computation of any divide & conquer algorithms can thus be viewed as a tree. unbalanced balanced Figure : Examples of unbalanced and balanced binary trees. A fully unbalanced tree has depth Θ(n) while a full balanced tree has depth Θ(logn). As we go through the QuickSort analysis below, keep these pictures in mind, as they will help you understand why QuickSort is a fast algorithm. method; other methods include annihilators, changing variables, characteristic polynomials and recurrence trees. You can read about many of these in the textbook (Chapter 4.). Consider the case of a =. Figure 1 shows a schematic of the way a generic divide and conquer algorithm works. The algorithm effectively builds a computation or recursion tree, in which an internal node represents a specific non-trivial subproblem. Trivial or base cases are located at the leaves. As the function explores the tree, it uses a stack data structure (CLRS Chapter 10.1) to store the previous, not-yet-completed problems, to which the computer will return once it has completed the subproblems rooted at a given internal node. The path of the calculation through the recursion tree is a depth-first exploration.

The division of a problem of size n into subproblems of sizes n n and n determines the depth the tree, which determines how many times we incur the f(n) cost. The sum of these costs is the running time. Consider the cases of n = 1 and n = n/ (see Figure ). In the first case, each time we recurse, we carve off a trivial case from the current size, and this produces highly unbalanced recursion trees, with depth n. In the second case, we divide the problem in half each time, and the depth of the tree is lgn. If f(n) = ω(1), then the total cost is different between the two cases; in particular, the more balanced case is lower cost. Quicksort Quicksort is a classic divide and conquer algorithm, which uses a comparison-based approach for sorting a list of n numbers. In the naïve version, the worst case is Θ(n ) but its expected (average) is O(nlogn). This is asymptotically faster than algorithms like Bubble Sort 1 and Insertion Sort, whose average and worst cases are both Θ(n ). The deterministic version of Quicksort also exhibits O(n ) worst case behavior, which is not particularly fast. However, Quicksort s average case is O(n log n), which is provably the asymptotic lower bound on comparison-based sorting algorithms. By using randomness in a clever way, we can trick Quicksort into behaving as if every input is an average input, and thus producing very fast sorting. This is just as fast as another divide and conquer sorting algorithm called Mergesort. 3 This randomized version of Quicksort is generally considered the best sorting algorithm for large inputs. If implemented correctly, Quicksort is also an in place sorting algorithm, meaning that it requires only O(1) scratch space in addition to the space required for the input itself. 4 Here are the divide, conquer and combine steps of the Quicksort algorithm: 1 Bubble Sort. Let A be an array with n elements. While any pair A[i] and A[i+1] are out of order, let i = 1 to n 1 and for each value of i, if A[i] and A[i+1] are out of order then swap them. Insertion Sort. Let A be an array with n elements. For i = to n, first set j = i. Then, while j > 1 and A[j 1], A[j] are out of order, swap them and decrement j. 3 Mergesort. Let A be an array with n elements. If n = 1, A is already sorted. Otherwise, divide A into two equal sized subarrays and call Mergesort on each. Then step through the resulting two arrays, left-to-right, and merge them so that A is now sorted and return A. 4 Many implementations of Quicksort are not in-place algorithms because they copy intermediate results into an array the same size as the input; similarly, safe recursion, which passes variables by copy rather than by reference, does not yield an in-place algorithm. 3

1. divide: pick some element A[q] of the array A and partition A into two arrays A 1 and A such that every element in A 1 is A[q] and every element in A is > A[q]. We call the element A[q] the pivot element.. conquer: Quicksort A 1 and Quicksort A 3. combine: return A 1 concatenated with A[1] concatenated with A, which is now the sorted version of A. The trivial or base case for Quicksort can either be sorting an array of length 1, which requires no work, or sorting an array of length, which at most requires one comparison, three assignment operations and one temporary variable of constant size. Thus, the base case takes Θ(1) time and Θ(1) scratch space..1 Pseudocode Here is pseudocode for the main Quicksort procedure, which takes an array A and array bounds p,r as inputs. // Precondition: A is the array to be sorted, p>=1; // r is <= the size of A // Postcondition: A[p..r] is in sorted order Quicksort(A,p,r) { if (p<r){ q = Partition(A,p,r) Quicksort(A,p,q-1) Quicksort(A,q+1,r) }} This procedure calls another function Partition before it hits either recursive call. This implies that Partition must do something such that no additional work is required to stitch together the solutions of the subproblems. Partition takes the same types of inputs as Quicksort: 4

//Precondition: A[p..r] is the array to be partitioned, p>=1 and r<= size of A, // A[r] is the pivot element //Postcondition: Let A be the array A after the function is run. Then A [p..r] // contains the same elements as A[p..r]. Further, all elements in // A [p..res-1] are <= A[r], A [res] = A[r], and all elements in // A [res+1..r] are > A[r] Partition(A,p,r) { x = A[r] i = p-1 for (j=p; j<=r-1;j++) { if A[j]<=x { i++ exchange(a[i],a[j]) }} exchange(a[i+1],a[r]) return i+1 } pivot x > x? x Figure 3: The intermediate state of the Partition function: values in the x region have already been compared to the pivot x, found to be less than or equal in size and moved into this region; values in the > x region have been compared to x, found to be greater in size and moved into this region; values in the? region are each compared to x and then moved into either of the two other regions, depending on the result of the comparison.. Correctness Whenever we design an algorithm, we must endeavor to prove that it is correct, i.e., it yields the proper output on all possible inputs, even the worst of them. In other words, a correct algorithm does not fail on any input. If there exists even one input for which an algorithm does not perform correctly, then the algorithm is not correct. Our task as algorithm designers is to produce correct algorithms; that is, we need to prove that our algorithm never fails. 5

To show that an algorithm is correct, we identify some mathematical structure in its behavior that allows us to show that the claimed performance holds on all possible inputs. To prove that Quicksort is correct, we use the following insight: at each intermediate step of Partition, the array is composed of exactly 4 regions, with x being the pivot (see Figure 3): Region 1: values that are x (between locations p and i) Region : values that are > x (between locations i+1 and j 1) Region 3: unprocessed values (between locations j and r 1) Region 4: the value x (location r) Crucially, throughout the execution of Partition, regions 1 and are growing, while region 3 is shrinking. At the end of the loop inside Partition, region 3 is empty and all values except the pivot are in regions 1 and ; Partition then moves the pivot into its correct and final place. To prove the correctness of Quicksort, we will use what s called a loop invariant, which is a set of properties that are true at the beginning of each cycle through the for loop, for any location k. For QuickSort, here are the loop invariants: 1. If p k i then A[k] x. If i+1 k j 1 then A[k] > x 3. If k = r then A[k] = x Verify for yourself (as an at-home exercise) that (i) this invariant holds before the loop begins (initialization), (ii) if the invariant holds at the (i 1)th iteration, that it will hold after the ith iteration (maintenance), and (iii) show that if the invariant holds when the loop exists, that the array will be successfully partitioned (termination). This proves the correctness of Partition. The correctness of Quicksort follows by observing that after Partition, we have correctly divided the larger problem into two subproblems values less than the pivot and values greater than the pivot both of which we solve by calling Quicksort on each. After the first call to Quicksort returns, the subarray A[p..q-1] is sorted; after the second call returns, the subarray A[q1..r]+ is sorted. Because Partition is correct, the values in the first subarray are all less than A[q], and the values in the second subarray are all greater than A[q]. Thus, the subarray A[p..r] is in sorted order, and Quicksort is correct. 6

.3 An example To show concretely how Quicksort works, consider the array A = [,6,4,1,5,3]. The following table shows the execution of Quicksort on this input. The pivot for each invocation of Partition is in bold face. procedure arguments input output global array first partition x = 3 p = 0 r = 5 A = [,6,4,1,5,3] [,1 3 6,5,4] [,1,3,6,5,4] L QS, partition x = 1 p = 0 r = 1 A = [,1] [1 ] [1,,3,6,5,4] R QS, partition x = 4 p = 3 r = 5 A = [6,5,4] [4 5,6] [1,,3,4,5,6] L QS, partition do nothing [1,,3,4,5,6] R QS, partition x = 6 p = 4 r = 5 A = [5,6] [5 6] [1,,3,4,5,6] 3 On your own 1. Read Chapters 4 and 7 7

A brief aside about proofs by induction Induction is a simple and powerful technique for proving mathematical claims, and is one of the most common tools in algorithm analysis, precisely because of these properties. When we invoke this approach to proving a claim, we must state exactly what variable or property we are using induction on. ConsidertheexamplefromLecture1, thesumofthefirstnpositiveintegers, S(n) = 1++3+ +n. We claim that a closed form for this sum is S(n) = n(n+1). () We will mathematically prove, via induction on n, that n(n + 1)/ is the correct closed-form solution. 5 A proof by induction has two parts: Base case: in which we verify that S(n) is correct for the smallest possible value, in this case n = 1. (If the claim is about some recurrence relation, the base cases will often be given to you, e.g., T(0) = c 1 and T(1) = c, for two base cases where c 1 and c are constants.) That is, S(1) = 1(1+1) = 1. Inductive step: in which we (i) assume that S(n) holds for an arbitrary input of size n and then (ii) prove that it also holds for n + 1. In practice, this amounts to taking Eq. (), or its equivalent in your problem of choice, and replacing each instance of n with an instance of n+1 and the simplifying: S(n+1) = (n+1)((n+1)+1) = (n+1)(n+) = n +3n+ = n +n+n+ = n(n+1) + n+ = S(n)+(n+1), 5 Here, there is only one possible choice, but for more complex algorithms, there will be several choices. In those cases, our job is to choose correctly, so as to make the induction straightforward. A hint that you have chosen incorrectly is that the induction is hard to push through. 8

where in the final step we have used the definition of S(n) to reduce a more complicated expression into a simple one completes the inductive step. (In some problems, there will be cases to consider, e.g., n c for c constant and n < c, and in that situation, the inductive step must be applied to each.) In this class, we aim to present more formal proofs. Thus, rather than the more wordy, thinkingout-loud version above, here is a compact, formal proof, which states the claim precisely, states the assumptions clearly, and shows exactly what is necessary to understand the proof. Theorem: The sum of the first n positive integers is S(n) = n(n+1)/. Proof: Let n be an arbitrary positive integer, and assume inductively that S(n) = n(n+1)/. The base case n = 1 is S(1) = 1(1+1)/ = 1, and for any n 1 we have S(n+1) = (n+1)((n+1)+1) = n(n+1) + n+ = S(n)+(n+1) [ definition of S(n+1) ] [ algebra ] [ inductive hypothesis with S(n) ], If you would like to see more examples of proofs by induction or get a deeper mathematical explanation of why and how they work, see the excellent notes by Jeff Erickson at UIUC: http://www.cs.uiuc.edu/ jeffe/teaching/algorithms/notes/98-induction.pdf 9