Scribe: Sam Keller (2015), Seth Hildick-Smith (2016), G. Valiant (2017) Date: January 25, 2017

Similar documents
CS 561, Lecture 1. Jared Saia University of New Mexico

Deterministic and Randomized Quicksort. Andreas Klappenecker

1 Probabilistic analysis and randomized algorithms

CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics

Computer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Divide and Conquer

Lecture 2: Divide&Conquer Paradigm, Merge sort and Quicksort

Outline. CS 561, Lecture 6. Priority Queues. Applications of Priority Queue. For NASA, space is still a high priority, Dan Quayle

CS 137 Part 8. Merge Sort, Quick Sort, Binary Search. November 20th, 2017

Randomized Algorithms, Quicksort and Randomized Selection

Test 1 Review Questions with Solutions

Divide and Conquer Algorithms

EECS 2011M: Fundamentals of Data Structures

CSC Design and Analysis of Algorithms

Sorting Shabsi Walfish NYU - Fundamental Algorithms Summer 2006

CSC Design and Analysis of Algorithms. Lecture 6. Divide and Conquer Algorithm Design Technique. Divide-and-Conquer

The divide and conquer strategy has three basic parts. For a given problem of size n,

COSC242 Lecture 7 Mergesort and Quicksort

COMP Analysis of Algorithms & Data Structures

Total Points: 60. Duration: 1hr

Lecture #2. 1 Overview. 2 Worst-Case Analysis vs. Average Case Analysis. 3 Divide-and-Conquer Design Paradigm. 4 Quicksort. 4.

Quicksort (CLRS 7) We saw the divide-and-conquer technique at work resulting in Mergesort. Mergesort summary:

Problem. Input: An array A = (A[1],..., A[n]) with length n. Output: a permutation A of A, that is sorted: A [i] A [j] for all. 1 i j n.

MergeSort, Recurrences, Asymptotic Analysis Scribe: Michael P. Kim Date: September 28, 2016 Edited by Ofir Geri

Sorting. Sorting in Arrays. SelectionSort. SelectionSort. Binary search works great, but how do we create a sorted array in the first place?

Introduction to Algorithms

08 A: Sorting III. CS1102S: Data Structures and Algorithms. Martin Henz. March 10, Generated on Tuesday 9 th March, 2010, 09:58

Selection (deterministic & randomized): finding the median in linear time

Quicksort. Repeat the process recursively for the left- and rightsub-blocks.

Sorting. There exist sorting algorithms which have shown to be more efficient in practice.

1 (15 points) LexicoSort

Analysis of Algorithms - Quicksort -

Randomized Algorithms: Selection

II (Sorting and) Order Statistics

Divide-and-Conquer. The most-well known algorithm design strategy: smaller instances. combining these solutions

Comparison Based Sorting Algorithms. Algorithms and Data Structures: Lower Bounds for Sorting. Comparison Based Sorting Algorithms

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

MergeSort, Recurrences, Asymptotic Analysis Scribe: Michael P. Kim Date: April 1, 2015

Lecture 9: Sorting Algorithms

CS 171: Introduction to Computer Science II. Quicksort

CIS 121 Data Structures and Algorithms with Java Spring Code Snippets and Recurrences Monday, January 29/Tuesday, January 30

Algorithms and Data Structures: Lower Bounds for Sorting. ADS: lect 7 slide 1

Cosc 241 Programming and Problem Solving Lecture 17 (30/4/18) Quicksort

Randomized Algorithms, Hash Functions

Algorithms and Data Structures for Mathematicians

Data Structures and Algorithms Running time, Divide and Conquer January 25, 2018

Lecture 15 : Review DRAFT

Computer Science & Engineering 423/823 Design and Analysis of Algorithms

Divide and Conquer 4-0

Problem Set 4 Solutions

Algorithms in Systems Engineering ISE 172. Lecture 12. Dr. Ted Ralphs

Scientific Computing. Algorithm Analysis

Divide-and-Conquer. Dr. Yingwu Zhu

CSE 3101: Introduction to the Design and Analysis of Algorithms. Office hours: Wed 4-6 pm (CSEB 3043), or by appointment.

Recurrences and Divide-and-Conquer

Analysis of Algorithms - Medians and order statistic -

Lecture 4. The Substitution Method and Median and Selection

17/05/2018. Outline. Outline. Divide and Conquer. Control Abstraction for Divide &Conquer. Outline. Module 2: Divide and Conquer

Mergesort again. 1. Split the list into two equal parts

COMP Analysis of Algorithms & Data Structures

Lecture 2: Getting Started

Faster Sorting Methods

Quick Sort. CSE Data Structures May 15, 2002

CS 171: Introduction to Computer Science II. Quicksort

Jana Kosecka. Linear Time Sorting, Median, Order Statistics. Many slides here are based on E. Demaine, D. Luebke slides

Lecture 8: Mergesort / Quicksort Steven Skiena

Chapter 4. Divide-and-Conquer. Copyright 2007 Pearson Addison-Wesley. All rights reserved.

Lecture 4 Median and Selection

Sorting is a problem for which we can prove a non-trivial lower bound.

DIVIDE & CONQUER. Problem of size n. Solution to sub problem 1

Divide and Conquer. Algorithm D-and-C(n: input size)

Copyright 2009, Artur Czumaj 1

CPSC 320 Midterm 2 Thursday March 13th, 2014

Lecture 1 August 31, 2017

Algorithms Lectures 1-10

Object-oriented programming. and data-structures CS/ENGRD 2110 SUMMER 2018

Sorting. Task Description. Selection Sort. Should we worry about speed?

Introduction to Algorithms February 24, 2004 Massachusetts Institute of Technology Professors Erik Demaine and Shafi Goldwasser Handout 8

Chapter 1 Divide and Conquer Algorithm Theory WS 2014/15 Fabian Kuhn

4. Sorting and Order-Statistics

Scan and Quicksort. 1 Scan. 1.1 Contraction CSE341T 09/20/2017. Lecture 7

Unit-2 Divide and conquer 2016

Divide and Conquer Algorithms

7. Sorting I. 7.1 Simple Sorting. Problem. Algorithm: IsSorted(A) 1 i j n. Simple Sorting

CSC 505, Spring 2005 Week 6 Lectures page 1 of 9

CSC Design and Analysis of Algorithms. Lecture 5. Decrease and Conquer Algorithm Design Technique. Decrease-and-Conquer

CS 5321: Advanced Algorithms Sorting. Acknowledgement. Ali Ebnenasir Department of Computer Science Michigan Technological University

Better sorting algorithms (Weiss chapter )

Scribe: Virginia Williams, Sam Kim (2016), Mary Wootters (2017) Date: May 22, 2017

Chapter 5. Quicksort. Copyright Oliver Serang, 2018 University of Montana Department of Computer Science

Divide and Conquer. Algorithm Fall Semester

Multiple-choice (35 pt.)

Treaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19

Run Times. Efficiency Issues. Run Times cont d. More on O( ) notation

1 Minimum Cut Problem

CSC Design and Analysis of Algorithms. Lecture 6. Divide and Conquer Algorithm Design Technique. Divide-and-Conquer

CSE373: Data Structure & Algorithms Lecture 21: More Comparison Sorting. Aaron Bauer Winter 2014

8. Sorting II. 8.1 Heapsort. Heapsort. [Max-]Heap 6. Heapsort, Quicksort, Mergesort. Binary tree with the following properties Wurzel

Quicksort (Weiss chapter 8.6)

COMP Data Structures

CSE 332: Data Structures & Parallelism Lecture 12: Comparison Sorting. Ruth Anderson Winter 2019

Transcription:

CS 6, Lecture 5 Quicksort Scribe: Sam Keller (05), Seth Hildick-Smith (06), G. Valiant (07) Date: January 5, 07 Introduction Today we ll study another sorting algorithm. Quicksort was invented in 959 by Tony Hoare. You may wonder why we want to study a new sorting algorithm. We have already studied MergeSort, which we showed to perform significantly better than the trivial O(n ) algorithm. While MergeSort achieves an O(n log n) worst-case asymptotic bound, in practice, there are a number of implementation details about MergeSort that make it tricky to achieve high performance. Quicksort is an alternative algorithm, which is simpler to implement in practice. Quicksort will also use a divide and conquer strategy but will use randomization to improve the performance of the algortithm in expectation. Java, Unix, and C stdlib all have implementations of Quicksort as one of their built-in sorting routines. Quicksort Overview As in all sorting algorithms, we start with an array A of n numbers; again we assume without loss of generality that the numbers are distinct. Quicksort is very similar to the Select algorithm we studied last lecture. The description of Quicksort is the following: () If length of A : return A (it is trivially sorted). () Pick some element x A[i]. We call x the pivot. (3) Split the rest of A into A < (the elements less than x) and A > (the elements greater than x). (4) Rearrange A into [A <, x, A > ]. (5) Recurse on A <, A >. Steps -3 define a partition function on A. The partition function of Quicksort can vary depending on how the pivot is chosen and also on the implementation. Quicksort is often used in practice because we can implement this step in linear time, and with very small constant factors. In addition, the rearrangement (Step 3) can be done in-place, rather than making several copies of the array (as is done in MergeSort). In these notes we will not describe the details of an in-place implementation, but the pseudocode can be found in CLRS. 3 Speculation on the Runtime The performance of Quicksort depends on which element is chosen as the pivot. Assume that we choose the k th smallest element; then A < k, A > n k. This allows us to write a recurrence; let T (n) be the runtime of Quicksort on an n-element array. We know the partition step takes O(n) time; therefore the recurrence is T (n) cn + T (k ) + T (n k). For the worst pivot choice (the maximum or minimum element in the array), the runtime will resolve to T (n) T (n ) + O(n) O(n ).

One way that seems optimal to define the partition function is to pick the median as the pivot. In the above recurrence this would mean that k n. We showed in the previous lecture that the algorithm Select can find the median element in linear time. Therefore the recurrence becomes T (n) cn + T ( n ). This is exactly the same recurrence as MergeSort, which means that this algorithm is guaranteed to run in O(n log n) time. Unfortunately, the median selection algorithm is not practical; while it runs in linear time, it has much larger constant factors than we would like. To improve it, we will explore some alternative methods for choosing a pivot. We leave the proof of correctness of Quicksort as an exercise to the reader (a proof by induction is suggested). 3. Random Pivot Selection One method of defending against adversarial input is to choose a random element as the pivot. We observe that it is unlikely that the random element will be either the median (best-case) or the maximum or minimum (worst-case). Note that we have a uniform distribution over the n order statistics of the array (that is, for every i n we pick the i th -highest element with the same probability n ). However, for the first time, we encounter a randomized algorithm, and the analysis now becomes more complex. How do we analyze a randomized algorithm? 4 Worst-Case Analysis In this section we will derive a bound on the worst-case running time of Quicksort. If we consider the worst random choice of pivot at each step, the running time will be Θ(n ). We are thus interested in what is the running time of Quicksort on average over all possible choices of the pivots. Note that we still consider the running time for a worst-case input, and average only over the random choices of the algorithm (which is different from averaging over all possible inputs). We formalize the idea of averaging over the random choices of the algorithm by considering the running time of the algorithm on an input I as a random variable and bounding the expectation of that random variable. Claim. For every input array of size n, the expected running time of Quicksort is O(n log n). Recall that a random variable (often abbreviated as RV) is a function that maps every element in the sample space to a real number. In the case of rolling a die, the real number (or value of the point in the sample space) would be the number on the top of the die. Here the sample space is the set of all possible choices of pivots, and an example for a random variable can be the running time of Quicksort on a specific input I. Denote by z i the i th element in the sorted array. For each i, j, we define a random variable X i,j (σ) to be the number of times z i and z j are compared for a given series of pivot choices σ. What are the possible values for X i,j (σ)? It can be 0 if z i and z j are not compared. Note that all comparisons are with the pivot, and that the pivot is not included in the elements of the arrays in the recursive calls. Thus, no two elements are compared twice. Therefore, X i,j (σ) 0,. Our goal is to compute the expected number of comparisons that Quicksort makes. Recall the definition of expectation: E[X] P (σ)x(σ) kp (x k). σ k An important property of expectation is the linearity of expectation. For any random variables X,..., X n : [ n ] E X i E[X i ].

We start with computing the expected value of X i,j. These variables are indicator random variables, which take the value if some event happens, and 0 otherwise. The expected value is E[X i,j ] P (X i,j ) + P (X i,j 0) 0 P (X i,j ) Let C(σ) be the total number of comparisons made by Quicksort for a given set of pivot choices σ: C(σ) ji+ X i,j (σ). We wish to compute E[C] to get the expected number of comparisons made by Quicksort for an input array of size n. E[C] E ji+ E ji+ ji+ P (z i, z j are compared) Now we find P (z i, z j are compared). Note that each element in the array except the pivot is compared to the pivot at each level of the recurrence. To analyze P (z i, z j are compared), we need to examine the portion of the array [z i..., z j ]. After the array is split using a pivot from [z i..., z j ], z i and z j can no longer be compared. Hence, z i and z j are compared only when from [z i..., z j ], either z i or z j is the first one picked as the pivot. So, P (z i, z j compared) P (z i or z j is the first pivot picked from [z i..., z j ]) j i + + j i + j i + We return to the expected value of C: Note that for a fixed i, E[C] ji+ ji+ ji+ P (z i, z j are compared) j i + j i + + 3 +... + n i + + 3 +... + n 3

And using n k k ln n, we get that E[C] E ji+ ji+ n ln n j i + Thus, the expected number of comparisons made by Quicksort is no greater than n ln n O(n log n). To complete the proof, we have to show that the running time is dominated by the number of comparisons. Note that in each recursive call to Quicksort on an array of size k, the algorithm performs k comparisons in order to split the array, and the amount of work done is O(k). In addition, Quicksort will be called on singleelement arrays at most once for each element in the original array, so the total running time of Quicksort is O(C + n). In conclusion, the expected running time of Quicksort on worst-case input is O(n log n). 4. Alternative Proof Here we provide an alternative method for bounding the expected number of comparisons. Let T (n) be the expected number of comparisons performed by Quicksort on an input of size n. In general, if the pivot is chosen to be the i th order statistic of the input array, T (n) n + T (i ) + T (n i). where we define T (0) 0. Each of the n possible choices of i are equally likely. Thus, the expected number of comparisons is: We use two facts: T (n) n + n n + n ( ) T (i ) + T (n i) n ( ) T (i). n f(i) n f(x)dx for an increasing function f. x ln xdx x ln x x + C Now we show that T (n) n ln n by induction. Base case: An array of size is trivially sorted and requires no comparisons. Thus, T () 0. Inductive hypothesis: T (i) i ln i for all i < n 4

Inductive step: We bound T (n): T (n) n + n T (i) n n + n i ln i n n + n n + n n (x ln x)dx [ n ln n n + ] n ln n + n n + n n ln n + n n ln n 5