Quicksort (CLRS 7) We saw the divide-and-conquer technique at work resulting in Mergesort. Mergesort summary:

Similar documents
Quick Sort. Mithila Bongu. December 13, 2013

Lecture 2: Divide&Conquer Paradigm, Merge sort and Quicksort

Randomized Algorithms, Quicksort and Randomized Selection

CPSC 311 Lecture Notes. Sorting and Order Statistics (Chapters 6-9)

CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics

Sorting is a problem for which we can prove a non-trivial lower bound.

Sorting: Given a list A with n elements possessing a total order, return a list with the same elements in non-decreasing order.

COSC242 Lecture 7 Mergesort and Quicksort

1 Probabilistic analysis and randomized algorithms

Algorithms Lab 3. (a) What is the minimum number of elements in the heap, as a function of h?

CS 137 Part 8. Merge Sort, Quick Sort, Binary Search. November 20th, 2017

Quick Sort. Biostatistics 615 Lecture 9

e-pg PATHSHALA- Computer Science Design and Analysis of Algorithms Module 10

Analysis of Algorithms - Quicksort -

The divide and conquer strategy has three basic parts. For a given problem of size n,

Lecture 8: Mergesort / Quicksort Steven Skiena

Selection (deterministic & randomized): finding the median in linear time

Problem. Input: An array A = (A[1],..., A[n]) with length n. Output: a permutation A of A, that is sorted: A [i] A [j] for all. 1 i j n.

Jana Kosecka. Linear Time Sorting, Median, Order Statistics. Many slides here are based on E. Demaine, D. Luebke slides

Scribe: Sam Keller (2015), Seth Hildick-Smith (2016), G. Valiant (2017) Date: January 25, 2017

CS125 : Introduction to Computer Science. Lecture Notes #38 and #39 Quicksort. c 2005, 2003, 2002, 2000 Jason Zych

Cosc 241 Programming and Problem Solving Lecture 17 (30/4/18) Quicksort

Sorting Shabsi Walfish NYU - Fundamental Algorithms Summer 2006

CSE 332: Data Structures & Parallelism Lecture 12: Comparison Sorting. Ruth Anderson Winter 2019

Quick Sort. CSE Data Structures May 15, 2002

CSE 3101: Introduction to the Design and Analysis of Algorithms. Office hours: Wed 4-6 pm (CSEB 3043), or by appointment.

Sorting. Sorting. Stable Sorting. In-place Sort. Bubble Sort. Bubble Sort. Selection (Tournament) Heapsort (Smoothsort) Mergesort Quicksort Bogosort

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

Scan and Quicksort. 1 Scan. 1.1 Contraction CSE341T 09/20/2017. Lecture 7

4.4 Algorithm Design Technique: Randomization

Data Structures and Algorithms Chapter 4

Data Structures. Sorting. Haim Kaplan & Uri Zwick December 2013

Lecture 4 Median and Selection

CS 303 Design and Analysis of Algorithms

COSC 311: ALGORITHMS HW1: SORTING

Mergesort again. 1. Split the list into two equal parts

Quicksort. Alternative Strategies for Dividing Lists. Fundamentals of Computer Science I (CS F)

17/05/2018. Outline. Outline. Divide and Conquer. Control Abstraction for Divide &Conquer. Outline. Module 2: Divide and Conquer

Algorithms - Ch2 Sorting

COMP Analysis of Algorithms & Data Structures

Divide and Conquer Sorting Algorithms and Noncomparison-based

CITS3001. Algorithms, Agents and Artificial Intelligence. Semester 2, 2016

Sorting & Growth of Functions

Algorithms in Systems Engineering ISE 172. Lecture 12. Dr. Ted Ralphs

Divide and Conquer 4-0

DIVIDE & CONQUER. Problem of size n. Solution to sub problem 1

DATA STRUCTURES AND ALGORITHMS

Quicksort (Weiss chapter 8.6)

7.3 A randomized version of quicksort

Data Structures and Algorithms

Mergesort again. 1. Split the list into two equal parts

Lecture 7 Quicksort : Principles of Imperative Computation (Spring 2018) Frank Pfenning

4. Sorting and Order-Statistics

How many leaves on the decision tree? There are n! leaves, because every permutation appears at least once.

CSE373: Data Structure & Algorithms Lecture 21: More Comparison Sorting. Aaron Bauer Winter 2014

Deterministic and Randomized Quicksort. Andreas Klappenecker

Sorting. Sorting in Arrays. SelectionSort. SelectionSort. Binary search works great, but how do we create a sorted array in the first place?

EECS 2011M: Fundamentals of Data Structures

Lecture 5: Sorting Part A

Comparison Based Sorting Algorithms. Algorithms and Data Structures: Lower Bounds for Sorting. Comparison Based Sorting Algorithms

CSE 373: Data Structures and Algorithms

Computer Science & Engineering 423/823 Design and Analysis of Algorithms

Algorithms and Data Structures. Marcin Sydow. Introduction. QuickSort. Sorting 2. Partition. Limit. CountSort. RadixSort. Summary

Divide and Conquer. Algorithm D-and-C(n: input size)

CS 171: Introduction to Computer Science II. Quicksort

We can use a max-heap to sort data.

Algorithms and Data Structures: Lower Bounds for Sorting. ADS: lect 7 slide 1

CS S-11 Sorting in Θ(nlgn) 1. Base Case: A list of length 1 or length 0 is already sorted. Recursive Case:

Outline. CS 561, Lecture 6. Priority Queues. Applications of Priority Queue. For NASA, space is still a high priority, Dan Quayle

CS 3343 (Spring 2018) Assignment 4 (105 points + 15 extra) Due: March 9 before class starts

Treaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19

Data Structures and Algorithms Week 4

Lecture Notes on Quicksort

Object-oriented programming. and data-structures CS/ENGRD 2110 SUMMER 2018

Data Structures and Algorithms

CSE 143. Two important problems. Searching and Sorting. Review: Linear Search. Review: Binary Search. Example. How Efficient Is Linear Search?

RAM with Randomization and Quick Sort

Module 3: Sorting and Randomized Algorithms

Divide and Conquer Algorithms

Module 3: Sorting and Randomized Algorithms

The divide-and-conquer paradigm involves three steps at each level of the recursion: Divide the problem into a number of subproblems.

CS 112 Introduction to Computing II. Wayne Snyder Computer Science Department Boston University

CSC Design and Analysis of Algorithms

Chapter 1 Divide and Conquer Algorithm Theory WS 2014/15 Fabian Kuhn

Module 3: Sorting and Randomized Algorithms. Selection vs. Sorting. Crucial Subroutines. CS Data Structures and Data Management

Design and Analysis of Algorithms

CS240 Fall Mike Lam, Professor. Quick Sort

Lecture 2: Getting Started

CS 171: Introduction to Computer Science II. Quicksort

Divide-and-Conquer. The most-well known algorithm design strategy: smaller instances. combining these solutions

CSE373: Data Structure & Algorithms Lecture 18: Comparison Sorting. Dan Grossman Fall 2013

Design and Analysis of Algorithms

Introduction to Algorithms and Data Structures. Lecture 12: Sorting (3) Quick sort, complexity of sort algorithms, and counting sort

Divide and Conquer Algorithms

BM267 - Introduction to Data Structures

7. Sorting I. 7.1 Simple Sorting. Problem. Algorithm: IsSorted(A) 1 i j n. Simple Sorting

Algorithms and Data Structures for Mathematicians

CIS 121 Data Structures and Algorithms with Java Spring Code Snippets and Recurrences Monday, January 29/Tuesday, January 30

Next. 1. Covered basics of a simple design technique (Divideand-conquer) 2. Next, more sorting algorithms.

Introduction to Algorithms 6.046J/18.401J/SMA5503

Transcription:

Quicksort (CLRS 7) We saw the divide-and-conquer technique at work resulting in Mergesort. Mergesort summary: Partition n elements array A into two subarrays of n/2 elements each Sort the two subarrays recursively Merge the two subarrays Running time: T (n) = 2T (n/2) + Θ(n) T (n) = Θ(n log n). Mergesort is not in-place. Another possibility is to divide the elements such that there is no need of merging, that is Partition A[1...n] into subarrays A = A[1..q] and A = A[q + 1...n] such that all elements in A are larger than all elements in A. Recursively sort A and A. (no need to to combine/merge. A already sorted after sorting A and A ) Thus the pseudo code for Quicksort would look like this. As usual with recursive functions on arrays, we see the array indices p and r as arguments. Quicksort(A, p, r) IF p < r THEN q=partition(a, p, r) Quicksort(A, p, q 1) Quicksort(A, q + 1, r) To sort the entire array, one would need to call Quicksort(A, 0, n 1). Let s ignore for the moment how Partition would actually work -we ll do that in a second. What can we expect in terms of running time? If q = n/2 and we partition in Θ(n) time, we again get the recurrence T (n) = 2T (n/2) + Θ(n) for the running time T (n) = Θ(n log n) The problem is: can we develop a partition algorithm which divides A in (approximately) two halves? 1

Again, let s ignore for the moment how Partition would actually work -we ll do that in a second. Why is Quicksort correct? Can be shown inductively, assuming that Partition works correctly (which can be shown separately). Basically, if Partition works as expected and Quicksort sorts correctly the two sides of the partition (this is the inductive hypothesis), then it is easy to see that the whole array is sorted correctly. For those of you who know induction, this should be pretty straightfoward. For those of you who don;t know induction, either do some vigorous hand waving, or better, read up on induction there are some great links on the class website. Partition Now that we got the big picture, let s dive into how to partition. First, note that it would be pretty eay to partition if we did not have to do it in place (think, think..). The trick is doing it in-place. If we are able to come up with an in-place partition, this immediately leads to in-place quicksort, which is nice. Below is the code from the textbook (CLRS). It is called Lomuto partition, from the one who first described it. Partition(A, p, r) x = A[r] i = p 1 FOR j = p TO r 1 DO IF A[j] x THEN FI i = i + 1 Exchange A[i] and A[j] OD Exchange A[i + 1] and A[r] RETURN i + 1 Example: 2 1 7 8 3 5 6 4 2 1 3 8 7 5 6 4 2 1 3 8 7 5 6 4 2 1 3 8 7 5 6 4 2 1 3 4 7 5 6 8 i=0, j=1 i=1, j=2 i=1, j=3 i=1, j=4 i=2, j=5 i=3, j=6 i=3, j=7 i=3, j=8 q=4 Partition can be proved correct using the loop invariants: 2

A[k] x for p k i A[k] > x for i + 1 k j 1 A[k] = x for k = r These are true before the execution of the loop, when i = p 1, j = p; and are true after every execution of the loop. Thus at the end when j = p 1 this means partition works correctly. Formally one would need to use induction. Partition analysis: Partition runs in time Θ(r p) (does one pass through the input), which is linear in the number of elements that are partitioned. quicksort analysis Running time depends on how well Partition divides A. In the example above it does reasonably well. If array is always partitioned nicely in two halves (partition returns q = r+p 2 ), we have the recurrence T (n) = 2T (n/2) + Θ(n) T (n) = Θ(n lg n). But, in the worst case Partition always returns q = p or q = r and the running time becomes T (n) = Θ(n) + T (0) + T (n 1) T (n) = Θ(n 2 ). And maybe even worse, the worst case is when A is already sorted. Let s dig a step further. So all-good-splits give us Θ(n lg n) and all-bad-splits give us Θ(n 2 ). What about a combination of good and bad splits? Note that a good split is not necessarily a perfectly balanced one ( 1 2 -to- 1 2 ). Any split that puts a constact fraction of the elements to one side is good. Example: Split is 9 10 n, 1 10 n. T (n) = T ( 9 10 n) + T ( 1 10 n) + n Solution: Θ(n lg n) So splits like 1 2 -to- 1 2, 1 3 -to- 2 3,,,, 1 9 -to- 8 9,... are all good. There are a LOT of good splits!!! Even if every only other split is good, we can show that Quicksort still performs well (a bad split is absorbed into a good split). Intuitively, there are A LOT of cases where quicksort will perform in Θ(n lg n) time. Can we theoretically justify this? 3

Average running time Formally, we want to find what is the average case running time of quicksort. Is it close to worst-case (Θ(n 2 ), or to the best case Θ(n lg n)? We have to be careful here with the set of inputs on which we take the average, because average time depends on the distribution of inputs for which we take the average. If we run quicksort on a set of inputs that are all almost sorted, the average running time will be close to the worst-case. Similarly, if we run quicksort on a set of inputs that give good splits, the average running time will be close to the best-case. Most of the time, people analyse average time assuming a uniform distribution of the input. What happens if we run quicksort on a set of inputs which are picked uniformly at random from the space of all possible input permutations? Then the average case will also be close to the best-case. Why? Intuitively, if any input ordering is equally likely, then we expect at least as many good splits as bad splits, therefore on the average a bad split will be followed by a good split, and it gets absorbed in the good split. So, under the assumption that all input permutations are equally likely, the average time of Quicksort is Θ(n lg n). This can be proved formally, but we won t do it in this class. Is is realistic to assume that all input permutations are equally likely? Not really. In many cases the input is almost sorted (e.g. rebuilding index in a database etc). If the assumption of uniform distribution is not realistic, then the fact that Quicksort has a good average time does not help us that much. How can we make Quicksort have a good average time irrespective of the input distribution? YES, using randomization!! See below. Randomization We consider what are called randomized algorithms, that is, algorithms that make some random choices during their execution. Running time of a normal deterministic algorithm only depends on the input. Running time of a randomized algorithm depends not only on input but also on the random choices made by the algorithm. Running time of a randomized algorithm is not fixed for a given input! 4

Randomized algorithms have best-case and worst-case running times, but the inputs for which these are achieved are not known, they can be any of the inputs. We are normally interested in analyzing the expected running time T e (n) of a randomized algorithm, that is, the expected (average) running time for all inputs of size n. Here T (X) denotes the running time on input X (of size n) Randomized Quicksort T e (n) = E X =n [T (X)] There are two ways to go about randomizing Quicksort. 1. The first one is: We can enforce that all n! permutations are equally likely by randomly permuting the input before the algorithm. Most computers have pseudo-random number generator random(1, n) returning random number between 1 and n Using pseudo-random number generator we can generate a random permutation (such that all n! permutations equally likely) in O(n) time: Choose element in A[1] randomly among elements in A[1..n], choose element in A[2] randomly among elements in A[2..n], choose element in A[3] randomly among elements in A[3..n], and so on. 2. Alternatively, we can modify Partition slightly and exchange last element in A with random element in A before partitioning. RandPartition(A, p, r) i=random(p, r) Exchange A[r] and A[i] RETURN Partition(A, p, r) RandQuicksort(A, p, r) IF p < r THEN q=randpartition(a, p, r) RandQuicksort(A, p, q 1) RandQuicksort(A, q + 1, r) It can be shown formally that the expected running time of randomized quicksort (on inputs of size n) is Θ(n lg n). We do not do it in this class, but I encourage you to read the text book! NOTES In a future lectore (selection) we will see how to make quicksort run in worst-case O(n log n) time. 5

Trivia: Quicksort was invented by Sir Charles Antony Richard Tony Hoare, a British computer scientist and winner of the 1980 Turing Award, in 1960, while he was a visiting student at Moscow State University. While in Moscow he was in the school of Kolmogorov (well known mathematician). The original Quickort was described with a different partition algorithm, one that works from both ends; this is referd to as Hoare-partition (after Hoare who invented Quicksort). We ll look at this different way to partition in the lab. Both Hoare s and Lomuto partitions are in-place, but none of them is stable. Thus, in an efficient implementation, with an in-place partition, Quicksort is NOT stable. I don t know of any practical comparison of the two partitions, and I would guess they are both equally fast in practice. Lomuto-partition may be considered simpler and better, because it uses a single pointer (i, to point to the last element left of pivot) and this is why textbooks prefer it. OTOH Hoare-partition does well (ie O(n lg n)) when all elements are equal, while Lomuto-partition is quadratic. The idea behind Lomuto s partition can be extended to deal with elements equal to the pivot (place them in the middle instead of carrying them through recursion); also with 3-way partitioning. Quicksort can be made stable by using extra space (thus not in place anymore). 6