MergeSort, Recurrences, Asymptotic Analysis Scribe: Michael P. Kim Date: September 28, 2016 Edited by Ofir Geri

Similar documents
MergeSort, Recurrences, Asymptotic Analysis Scribe: Michael P. Kim Date: April 1, 2015

1 Introduction. 2 InsertionSort. 2.1 Correctness of InsertionSort

Plotting run-time graphically. Plotting run-time graphically. CS241 Algorithmics - week 1 review. Prefix Averages - Algorithm #1

Lecture 2: Getting Started

The divide-and-conquer paradigm involves three steps at each level of the recursion: Divide the problem into a number of subproblems.

The divide and conquer strategy has three basic parts. For a given problem of size n,

Lecture 5: Running Time Evaluation

Lecture Notes for Chapter 2: Getting Started

0.1 Welcome. 0.2 Insertion sort. Jessica Su (some portions copied from CLRS)

Sorting & Growth of Functions

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

Algorithms and Data Structures

CS/ENGRD 2110 Object-Oriented Programming and Data Structures Spring 2012 Thorsten Joachims. Lecture 10: Asymptotic Complexity and

9/10/2018 Algorithms & Data Structures Analysis of Algorithms. Siyuan Jiang, Sept

PROGRAM EFFICIENCY & COMPLEXITY ANALYSIS

CSE 146. Asymptotic Analysis Interview Question of the Day Homework 1 & Project 1 Work Session

Divide and Conquer Algorithms

CS302 Topic: Algorithm Analysis. Thursday, Sept. 22, 2005

(Refer Slide Time: 1:27)

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Dynamic Programming I Date: 10/6/16

Algorithm Analysis. (Algorithm Analysis ) Data Structures and Programming Spring / 48

Reasoning and writing about algorithms: some tips

Lecture 15 : Review DRAFT

Scientific Computing. Algorithm Analysis

Lecture 5 Sorting Arrays

SEARCHING, SORTING, AND ASYMPTOTIC COMPLEXITY. Lecture 11 CS2110 Spring 2016

(Refer Slide Time: 01.26)

Theory and Algorithms Introduction: insertion sort, merge sort

Algorithms - Ch2 Sorting

CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics

Algorithms. Algorithms 1.4 ANALYSIS OF ALGORITHMS

Measuring algorithm efficiency

SEARCHING, SORTING, AND ASYMPTOTIC COMPLEXITY

Another Sorting Algorithm

Scribe: Sam Keller (2015), Seth Hildick-Smith (2016), G. Valiant (2017) Date: January 25, 2017

CITS3001. Algorithms, Agents and Artificial Intelligence. Semester 2, 2016

Introduction to Algorithms

Algorithm Efficiency & Sorting. Algorithm efficiency Big-O notation Searching algorithms Sorting algorithms

CS302 Topic: Algorithm Analysis #2. Thursday, Sept. 21, 2006

Scribe: Virginia Williams, Sam Kim (2016), Mary Wootters (2017) Date: May 22, 2017

CSE373: Data Structures and Algorithms Lecture 4: Asymptotic Analysis. Aaron Bauer Winter 2014

COMP 161 Lecture Notes 16 Analyzing Search and Sort

Data Structures and Algorithms. Part 2

ASYMPTOTIC COMPLEXITY

CS 4349 Lecture August 21st, 2017

Computer Science 210 Data Structures Siena College Fall Topic Notes: Complexity and Asymptotic Analysis

Introduction to Algorithms 6.046J/18.401J

Data Structures Lecture 8

10/5/2016. Comparing Algorithms. Analyzing Code ( worst case ) Example. Analyzing Code. Binary Search. Linear Search

Data Structures and Algorithms CSE 465

Lecture Notes on Quicksort

Lecture 6: Sequential Sorting

ANALYSIS OF ALGORITHMS

6.001 Notes: Section 4.1

Today s Outline. CSE 326: Data Structures Asymptotic Analysis. Analyzing Algorithms. Analyzing Algorithms: Why Bother? Hannah Takes a Break

Data Structures and Algorithms Chapter 2

Introduction to the Analysis of Algorithms. Algorithm

CS 137 Part 7. Big-Oh Notation, Linear Searching and Basic Sorting Algorithms. November 10th, 2017

Algorithm efficiency can be measured in terms of: Time Space Other resources such as processors, network packets, etc.

Intro to Algorithms. Professor Kevin Gold

CSE 373 APRIL 3 RD ALGORITHM ANALYSIS

CMPSCI 187: Programming With Data Structures. Lecture 5: Analysis of Algorithms Overview 16 September 2011

ASYMPTOTIC COMPLEXITY

UNIT 1 ANALYSIS OF ALGORITHMS

Goal of the course: The goal is to learn to design and analyze an algorithm. More specifically, you will learn:

The Limits of Sorting Divide-and-Conquer Comparison Sorts II

Lecture 4 Median and Selection

Algorithms in Systems Engineering IE172. Midterm Review. Dr. Ted Ralphs

DESIGN AND ANALYSIS OF ALGORITHMS. Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES

Lecture Notes on Sorting

Outline. Introduction. 2 Proof of Correctness. 3 Final Notes. Precondition P 1 : Inputs include

Lecture 1. Introduction / Insertion Sort / Merge Sort

6/12/2013. Introduction to Algorithms (2 nd edition) Overview. The Sorting Problem. Chapter 2: Getting Started. by Cormen, Leiserson, Rivest & Stein

Lecture 7 Quicksort : Principles of Imperative Computation (Spring 2018) Frank Pfenning

Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES

CS:3330 (22c:31) Algorithms

Lecture #2. 1 Overview. 2 Worst-Case Analysis vs. Average Case Analysis. 3 Divide-and-Conquer Design Paradigm. 4 Quicksort. 4.

Sorting is a problem for which we can prove a non-trivial lower bound.

Sorting. Sorting in Arrays. SelectionSort. SelectionSort. Binary search works great, but how do we create a sorted array in the first place?

Chapter 2: Complexity Analysis

Greedy Algorithms 1 {K(S) K(S) C} For large values of d, brute force search is not feasible because there are 2 d {1,..., d}.

Writing Parallel Programs; Cost Model.

Introduction to Algorithms 6.046J/18.401J/SMA5503

O(n): printing a list of n items to the screen, looking at each item once.

Reading for this lecture (Goodrich and Tamassia):

Recall from Last Time: Big-Oh Notation

COSC 311: ALGORITHMS HW1: SORTING

Merge Sort. Yufei Tao. Department of Computer Science and Engineering Chinese University of Hong Kong

COE428 Lecture Notes Week 1 (Week of January 9, 2017)

CS 506, Sect 002 Homework 5 Dr. David Nassimi Foundations of CS Due: Week 11, Mon. Apr. 7 Spring 2014

MPATE-GE 2618: C Programming for Music Technology. Unit 4.2

Subset sum problem and dynamic programming

Big-O-ology 1 CIS 675: Algorithms January 14, 2019

ECS122A Lecture Notes on Algorithm Design and Analysis

Sorting: Given a list A with n elements possessing a total order, return a list with the same elements in non-decreasing order.

ISA 562: Information Security, Theory and Practice. Lecture 1

SEARCHING, SORTING, AND ASYMPTOTIC COMPLEXITY. Lecture 11 CS2110 Fall 2014

Searching Algorithms/Time Analysis

Binary Search to find item in sorted array

Run Times. Efficiency Issues. Run Times cont d. More on O( ) notation

Transcription:

CS161, Lecture 2 MergeSort, Recurrences, Asymptotic Analysis Scribe: Michael P. Kim Date: September 28, 2016 Edited by Ofir Geri 1 Introduction Today, we will introduce a fundamental algorithm design paradigm, Divide-And-Conquer, through a case study of the MergeSort algorithm. Along the way, we ll introduce guding principles for algorithm design, including Worst-Case and Asymptotic Analysis, which we will use throughout the remainder of the course. We will introduce asymptotic notation ( Big-Oh ) for analyzing the run times of algorithms. 2 MergeSort and the Divide-And-Conquer Paradigm The sorting problem is a canonical computer science problem. The problem is specified as follows: as input, you receive an array of n numbers, possibly unsorted; the goal is to output the same numbers, sorted in increasing order. Computer scientists care a lot about sorting because many other algorithms will use sorting as a subroutine. Thus, it is extremely important to find efficient algorithms for sorting lists, that work well in theory and in practice. In this lecture, we ll assume that the elements given in the array are distinct. It is a worthwhile exercise to go through the notes carefully and see which aspects of our analysis would need to change if we allow for ties amongst elements. 2.1 InsertionSort There are a number of natural algorithms for sorting a list of numbers. One such algorithm is Insertion- Sort. InsertionSort iterates through the elements of the given array and after each iteration produces a sorted array of the elements considered so far. At iteration i, the algorithm inserts the i th element into the right position in the sorted array of the first i 1 elements. The pseudocode 1 of InsertionSort from Section 2.1 of CLRS appears as Algorithm 1. Algorithm 1: InsertionSort(A) for i = 2 length(a) do key A[i]; j i 1; while j > 0 and A[j] > key do A[j + 1] A[j]; j j 1; A[j + 1] key; At iteration i, the algorithm may be required to move i elements, so the total runtime is roughly n i=1 i = n(n+1) 2 = O(n 2 ). The natural question to ask (which should become your mantra this quarter) is Can we do better?. In fact, the answer to this question is Yes. The point of today s lecture is to see how we might do better, and to begin to understand what better even means. 1 A note on pseudocode: We will write our algorithms in pseudocode. The point of pseudocode is to be broadly descriptive to convey your approach in solving a problem without getting hung up on the specifics of a programming language. In fact, one of the key benefits of using pseudocode to describe algorithms is that you can take the algorithm and implement it in any language you want based on your needs. 1

2.2 Divide-And-Conquer The Divide-And-Conquer paradigm is a broad pattern for designing algorithms to many problems. Unsurprisingly, the pattern uses the following strategy to solve problems. Break the problem into subproblems Solve the subproblems (recursively) Combine results of the subproblems This strategy requires that once the instances become small enough, the problem is trivial to solve (or it is cheap to find a solution through brute-force). With this pattern in mind, there is a very natural way to formulate a Divide-And-Conquer algorithm for the sorting problem. Consider the following pseudocode for MergeSort (in Algorithm 2). A[i : j] denotes the subarray of A from index i to j (including both A[i] and A[j]). Algorithm 2: MergeSort(A) n length(a); if n 1 then return A; L MergeSort(A[1 : n/2]); R MergeSort(A[n/2 + 1 : n]); return Merge(L, R); Now, we need to describe the Merge procedure, which takes two sorted arrays, L and R, and produces a sorted array containing the elements of L and R. Consider the following Merge procedure (Algorithm 3), which we will call as a subroutine in MergeSort. Algorithm 3: Merge(L, R) m length(l) + length(r); S empty array of size m; i 1; j 1; for k = 1 m do if L(i) < R(j) then S(k) L(i); i i + 1; else S(k) R(j); j j + 1; return S; Intuitively, Merge loops through every position in the final array, and at the ith call, looks for the ith smallest element. (As a special case, consider the fact that the smallest element in S must be the smallest element in L or the smallest element in R). Because L and R are sorted, we can find the ith smallest element quickly, by keeping track of which elements we ve processed so far. (The Merge subroutine, as written, is actually incomplete. You should think about how you would have to edit the pseudocode to handle what happens when we get to the end of L or R. 2 ) Now that we have an algorithm, the first question we always want to ask is: Is the algorithm correct? In this case, to answer this question, we will define an invariant which we will claim holds at every recursive 2 Furthermore, sorting functions usually change the order of elements in the given array instead of returning a sorted copy of it. For a more complete implementation of MergeSort, see Section 2.3.1 in CLRS. 2

Figure 2.4 from CLRS Schematic of the levels of recursive calls, or recursion tree, and the resulting calls to Merge call. The invariant will be the following: In every recursive call, MergeSort returns a sorted array. If we can prove that this invariant holds, it will immediately prove that MergeSort is correct, as the first call to MergeSort will return a sorted array. Here, we will prove that the invariant holds. Proof of Invariant. By induction. Consider the base case, of the algorithm, when MergeSort receives 1 element. Any array of 1 element is sorted (trivially), so in this case, by returning A, MergeSort returns a sorted list. To see the inductive step, suppose that after n 1 1 levels of recursive calls return, MergeSort still returns a sorted list. We will argue that the nth recursive call returns a sorted list. Consider some execution of MergeSort where L and R were just returned after n 1 recursive calls. Then, by the inductive hypothesis, L and R are both sorted. We argue that the result of Merge(L, R) will be a sorted list. To see this, note that in the Merge subroutine, the minimum remaining element must be at either the ith position in L or the jth position of R. When we find the minimum element, we will increment i or j accordingly, which will result in the next remaining minimal element still being in the ith position in L or the jth position of R. Thus, Merge will construct a sorted list, and our induction holds. 2.3 Running Time Analysis After answering the question of correctness, the next question to ask is: Is the algorithm good? Well, that depends on how we define the goodness of an algorithm. In this class, as is typical, we will generally reframe this question as: How does the running time of the algorithm grow with the size of the input? Generally, the slower the running time grows as the input increases in size, the better the algorithm. Our eventual goal is to argue about the running time of MergeSort, but this seems a bit challenging. In particular, each call to MergeSort makes a number of recursive calls and then calls Merge it isn t immediately obvious how we should bound the time that MergeSort takes. A seemingly less ambitious goal would be to analyze how long a call to Merge takes. We will start here, and see if this gives us something to grasp onto when arguing about the total running time. Consider a single call to Merge, where we ll assume the total size of S is m numbers. How long will it take for Merge to execute? To start, there are two initializations for i and j. Then, we enter a for loop which will execute m times. Each loop will require one comparison, followed by an assignment to S and an increment of i or j. Finally, we ll need to increment the counter in the for loop k. If we assume that each operation costs us a certain amount of time, say Cost a for assignment, Cost c for comparison, Cost i for incrementing a counter, then we can express the total time of the Merge subroutine as follows: 2Cost a + m(cost a + Cost c + 2Cost i ) 3

This is a precise, but somewhat unruly expression for the running time. In particular, it seems difficult to keep track of lots of different constants, and it isn t clear which costs will be more or less expensive (especially if we switch programming languages or machine architectures). To simplify our analysis, we choose to assume that there is some global constant c op which represents the cost of an operation. You may think of c op as max{cost a, Cost c, Cost i,...}. We can then bound the amount of running time for Merge as 2c op + 4c op m = 2 + 4m operations Using the fact that we know that m 1, we can upper bound this running time by 6m operations. Now that we have a bound on the number of operations required in a Merge of m numbers, we want to translate this into a bound on the number of operations required for MergeSort. At first glance, the pessimist in you may be concerned that at each level of recursive calls, we re spawning an exponentially increasing number of copies of MergeSort (because the number of calls at each depth doubles). Dual to this, the optimist in you will notice that at each level, the inputs to the problems are decreasing at an exponential rate (because the input size halves with each recursive call). Today, the optimists win out. Claim 1. MergeSort requires at most 6n log n + 6n operations to sort n numbers. 3 Before we go about proving this bound, let s first consider whether this running time bound is good. We mentioned earlier that more obvious methods of sorting, like InsertionSort, required roughly n 2 operations. How does n 2 = n n compare to n log n? An intuitive definition of log n is the following: Enter n into your calculator. Divide by 2 until the total is 1. The number of times you divided is the logarithm of n. This number in general will be significantly smaller than n. In particular, if n = 32, then log n = 5; if n = 1024, then log n = 10. Already, to sort arrays of 10 3 numbers, the savings of n log n as compared to n 2 will be orders of magnitude. At larger problem instances of 10 6, 10 9, etc. the difference will become even more pronounced! n log n is much closer to growing linearly (with n) than it is to growing quadratically (with n 2 ). One way to argue about the running time of recursive algorithms is to use recurrence relations. A recurrence relation for a running time expresses the time it takes to solve an input of size n in terms of the time required to solve the recursive calls the algorithm makes. In particular, we can write the running time T (n) for MergeSort on an array of n numbers as the following expression. T (n) = T (n/2) + T (n/2) + T (Merge(n)) 2 T (n/2) + 6n There are a number of sophisticated and powerful techniques for solving recurrences. We will cover many of these techniques in the coming lectures. Today, we can actually analyze the running time directly. Proof of Claim 1. Consider the recursion tree of a call to MergeSort on an array of n numbers. Assume for simplicity that n is a power of 2. Let s refer to the initial call as Level 0, the proceeding recursive calls as Level 1, and so on, numbering the level of recursion by its depth in the tree. How deep is the tree? At each level, the size of the inputs is divided in half, and there are no recursive calls when the input size is 1 element. By our earlier definition, this means the bottom level will be Level log n. Thus, there will be a total of log n + 1 levels. We can now ask two questions: (1) How many subproblems are there at Level i? (2) How large are the individual subproblems at Level i? We can observe that at the ith level, there will be 2 i subproblems, each with inputs of size n/2 i. So how much work do we do overall at the ith level? First, we need to make two recursive calls but the work done for these recursive calls will be accounted for by Level i + 1. Thus, we are really concerned about the cost of a call to Merge at the Level i. We can express the work per level as follows: 3 In this lecture and all future lectures, unless explicitly state otherwise, log refers to log 2. 4

Figure 2.5 from CLRS Analyzing the running time in terms of a Recursion Tree Work at Level i = (number of subproblems) (work per subproblem) ( n ) 2 i 6 2 i = 6n Importantly, we can see that the work done at Level i is independent of i it only depends on n and is the same for every level. This means we can bound the total running time as follows: Total RT = (work per level) (number of levels) (6n) (log n + 1) = 6n log n + 6n 3 Guiding Principles for Algorithm Design and Analysis After going through the algorithm and analysis, it is natural to wonder if we ve been too sloppy. In particular, note that the algorithm never looks at the input. For instance, what if we received the sequence of numbers [1, 2, 3, 5, 4, 6, 7, 8]? Clearly, there is a sorting algorithm for this sequence that only takes a few operations, but MergeSort runs through all log n + 1 levels of recursion anyway. Would it be better to try to design our algorithms with this in mind? Additionally, in our analysis, we ve given a very loose upper bound on the time required of Merge and dropped a number of constant factors and lower order terms. Is this a problem? In what follows, we ll argue that these are actually features, not bugs, in the design and analysis of the algorithm. 5

3.1 Worst-Case Analysis One guiding principle we ll use throughout the class is that of Worst-Case Analysis. In particular, this means that we want any statement we make about our algorithms to hold for every possible input. Stated differently, we can think about playing a game against an adversary, who wants to maximize our running time (make it as bad as possible). We get to specify an algorithm and state a running time T (n); the adversary then chooses an input. We win the game if even in the worst case, whatever input the adversary chooses (of size n), our algorithm runs in at most T (n) time. Note that because our algorithm made no assumptions about the input, then our running time bound will hold for every possible input. This is a very strong, robust guarantee. 4 3.2 Asymptotic Analysis Throughout our argument about MergeSort, we combined constants (think Cost a, Cost i, etc.) and gave very loose upper bounds (i.e. 6m 4m + 2). Why did we choose to do this? First, it makes the math much easier. But does it come at the cost of getting the right answer? Would we get a more predictive result if we threw all these exact expressions back into the analysis? From the perspective of an algorithm designer, the answer is to both of these questions is a resounding No. As an algorithm designer, we want to come up with results that are broadly applicable, whose truth does not depend on features of a specific programming language or machine architecture. The constants that we ve dropped will depend greatly on the language and machine on which you re working. For the same reason we use pseudocode instead of writing our algorithms in Java, trying to quantify the exact running time of an algorithm would be inappropriately specific. This is not to say that constant factors never matter in applications (e.g. I would be rather upset if my web browser ran 7 times slower than it does now) but worrying about these factors is not the goal of this class. In this class, our goal will be to argue about which strategies for solving problems are wise and why. In particular, we will focus on Asymptotic Analysis. This type of analysis focuses on the running time of your algorithm as your input size gets very large (i.e. n + ). This framework is motivated by the fact that if we need to solve a small problem, it doesn t cost that much to solve it by brute-force. If we want to solve a large problem, we may need to be much more creative in order for the problem to run efficiently. From this perspective, it should be very clear that 6n(log n + 1) is much better than n 2 /2. (If you are unconvinced, try plugging in some values for n.) Intuitively, we ll say that an algorithm is fast when the running time grows slowly with the input size. In this class, we want to think of growing slowly as growing as close to linear as possible. Based on this this intuitive notion, we can come up with a formal system for analyzing how quickly the running time of an algorithm grows with its input size. 3.3 Asymptotic Notation To talk about the running time of algorithms, we will use the following notation. T (n) denotes the runtime of an algorithm on input of size n. Big-Oh Notation: Intuitively, Big-Oh notation gives an upper bound on a function. We say T (n) is O(f(n)) when as n gets big, f(n) grows at least as quickly as T (n). Formally, we say T (n) = O(f(n)) c, n 0 > 0 s.t. n n 0, 0 T (n) c f(n) 4 In the case where you have significant domain knowledge about which inputs are likely, you may choose to design an algorithm that works well in expectation on these inputs (this is frequently referred to as Average-Case Analysis). This type of analysis is less common and can lead to difficulties. For example, understanding which inputs are more likely to appear can be very hard, or the algorithms can be too tailored to fit our assumptions on the input. 6

Big-Omega Notation: Intuitively, Big-Omega notation gives a lower bound on a function. We say T (n) is Ω(f(n)) when as n gets big, f(n) grows at least as slowly as T (n). Formally, we say Big-Theta Notation: T (n) = Ω(f(n)) c, n 0 > 0 s.t. n n 0, 0 c f(n) T (n) T (n) is Θ(f(n)) if and only if T (n) = O(f(n)) and T (n) = Ω(f(n)). Equivalently, we can say that T (n) = Θ(f(n)) c 1, c 2, n 0 > 0 s.t. n n 0, 0 c 1 f(n) T (n) c 2 f(n) Figure 3.1 from CLRS Examples of Asymptotic Bounds (Note: In these examples f(n) corresponds to our T (n) and g(n) corresponds to our f(n).) We can see that these notations really do capture exactly the behavior that we want namely, to focus on the rate of growth of a function as the inputs get large, ignoring constant factors and lower order terms. As a sanity check, consider the following example and non-example. Claim 2. All degree-k polynomials are O(n k ). Proof of Claim. Suppose T (n) is a degree-k polynomial. That is, T (n) = a k n k +... + a 1 n + a 0 for some choice of a i s where a k 0. To show that T (n) is O(n k ) we must find a c and n 0 such that for all n n 0 T (n) c n k. (Since T (n) represents the running time of an algorithm, we assume it is positive.) Let n 0 = 1 and let a = max i a i. We can bound T (n) as follows: T (n) = a k n k +... + a 1 n + a 0 a n k +... + a n + a a n k +... + a n k + a n k = (k + 1)a n k Let c = (k +1)a which is a constant, independent of n. Thus, we ve exhibited c, n 0 which satisfy the Big-Oh definition, so T (n) = O(n k ). Claim 3. For any k 1, n k is not O(n k 1 ). Proof of Claim. By contradiction. Assume n k = O(n k 1 ). Then there is some choice of c and n 0 such that n k c n k 1 for all n > n 0. But this in turn means that n c for all n n 0, which contradicts the fact that c is a constant, independent of n. Thus, our original assumption was false and n k is not O(n k 1 ). 7