CSE373: Data Structure & Algorithms Lecture 21: More Comparison Sorting. Aaron Bauer Winter 2014

Similar documents
CSE373: Data Structure & Algorithms Lecture 18: Comparison Sorting. Dan Grossman Fall 2013

CSE 373: Data Structures and Algorithms

CSE 332: Data Structures & Parallelism Lecture 12: Comparison Sorting. Ruth Anderson Winter 2019

Sorting. Riley Porter. CSE373: Data Structures & Algorithms 1

Announcements. HW4: Due tomorrow! Final in EXACTLY 2 weeks. Start studying. Summer 2016 CSE373: Data Structures & Algorithms 1

CSE 373: Data Structures & Algorithms More Sor9ng and Beyond Comparison Sor9ng

CSE 373 MAY 24 TH ANALYSIS AND NON- COMPARISON SORTING

CSE 373: Data Structure & Algorithms Comparison Sor:ng

CSE 373: Data Structures and Algorithms

Sorting Algorithms. + Analysis of the Sorting Algorithms

CSE 373: Data Structures and Algorithms

CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics

Design and Analysis of Algorithms

08 A: Sorting III. CS1102S: Data Structures and Algorithms. Martin Henz. March 10, Generated on Tuesday 9 th March, 2010, 09:58

CSE 373 NOVEMBER 8 TH COMPARISON SORTS

COMP Data Structures

Jana Kosecka. Linear Time Sorting, Median, Order Statistics. Many slides here are based on E. Demaine, D. Luebke slides

Mergesort again. 1. Split the list into two equal parts

Quicksort (Weiss chapter 8.6)

CSE373: Data Structures and Algorithms Lecture 4: Asymptotic Analysis. Aaron Bauer Winter 2014

Better sorting algorithms (Weiss chapter )

Quick Sort. CSE Data Structures May 15, 2002

CPSC 311 Lecture Notes. Sorting and Order Statistics (Chapters 6-9)

II (Sorting and) Order Statistics

Sorting. Sorting in Arrays. SelectionSort. SelectionSort. Binary search works great, but how do we create a sorted array in the first place?

How many leaves on the decision tree? There are n! leaves, because every permutation appears at least once.

Sorting. Bubble Sort. Pseudo Code for Bubble Sorting: Sorting is ordering a list of elements.

SAMPLE OF THE STUDY MATERIAL PART OF CHAPTER 6. Sorting Algorithms

Selection (deterministic & randomized): finding the median in linear time

4. Sorting and Order-Statistics

Divide and Conquer Sorting Algorithms and Noncomparison-based

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

CSE 3101: Introduction to the Design and Analysis of Algorithms. Office hours: Wed 4-6 pm (CSEB 3043), or by appointment.

Data Structures and Algorithms

Chapter 7 Sorting. Terminology. Selection Sort

CS 112 Introduction to Computing II. Wayne Snyder Computer Science Department Boston University

Algorithms Lab 3. (a) What is the minimum number of elements in the heap, as a function of h?

EECS 2011M: Fundamentals of Data Structures

We can use a max-heap to sort data.

DIVIDE AND CONQUER ALGORITHMS ANALYSIS WITH RECURRENCE EQUATIONS

Data Structures and Algorithms Chapter 4

Lecture 19 Sorting Goodrich, Tamassia

DATA STRUCTURES AND ALGORITHMS

Comparison Based Sorting Algorithms. Algorithms and Data Structures: Lower Bounds for Sorting. Comparison Based Sorting Algorithms

Unit Outline. Comparing Sorting Algorithms Heapsort Mergesort Quicksort More Comparisons Complexity of Sorting 2 / 33

Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, Merge Sort & Quick Sort

A Sophomoric Introduction to Shared-Memory Parallelism and Concurrency Lecture 3 Parallel Prefix, Pack, and Sorting

What next? CSE332: Data Abstractions Lecture 20: Parallel Prefix and Parallel Sorting. The prefix-sum problem. Parallel prefix-sum

Data Structures and Algorithms Week 4

Sorting. Popular algorithms: Many algorithms for sorting in parallel also exist.

Algorithms and Data Structures: Lower Bounds for Sorting. ADS: lect 7 slide 1

Lecture 5: Sorting Part A

SORTING AND SELECTION

Sorting Algorithms. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University

Question 7.11 Show how heapsort processes the input:

Algorithmic Analysis and Sorting, Part Two. CS106B Winter

Divide and Conquer CISC4080, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang

Sorting. Sorting. Stable Sorting. In-place Sort. Bubble Sort. Bubble Sort. Selection (Tournament) Heapsort (Smoothsort) Mergesort Quicksort Bogosort

CS : Data Structures

Divide and Conquer CISC4080, Computer Algorithms CIS, Fordham Univ. Acknowledgement. Outline

IS 709/809: Computational Methods in IS Research. Algorithm Analysis (Sorting)

Lecture #2. 1 Overview. 2 Worst-Case Analysis vs. Average Case Analysis. 3 Divide-and-Conquer Design Paradigm. 4 Quicksort. 4.

CS 561, Lecture 1. Jared Saia University of New Mexico

Lecture 8: Mergesort / Quicksort Steven Skiena

Faster Sorting Methods

Quicksort. Repeat the process recursively for the left- and rightsub-blocks.

CSE332: Data Abstractions Lecture 21: Parallel Prefix and Parallel Sorting. Tyler Robison Summer 2010

SORTING, SETS, AND SELECTION

Lecture 7 Quicksort : Principles of Imperative Computation (Spring 2018) Frank Pfenning

Searching, Sorting. part 1

Introduction to Computers and Programming. Today

Sorting Shabsi Walfish NYU - Fundamental Algorithms Summer 2006

O(n): printing a list of n items to the screen, looking at each item once.

CS125 : Introduction to Computer Science. Lecture Notes #40 Advanced Sorting Analysis. c 2005, 2004 Jason Zych

COSC242 Lecture 7 Mergesort and Quicksort

Lecture Notes 14 More sorting CSS Data Structures and Object-Oriented Programming Professor Clark F. Olson

Lower bound for comparison-based sorting

The Limits of Sorting Divide-and-Conquer Comparison Sorts II

DATA STRUCTURES AND ALGORITHMS

CS 137 Part 8. Merge Sort, Quick Sort, Binary Search. November 20th, 2017

Plan of the lecture. Quick-Sort. Partition of lists (or using extra workspace) Quick-Sort ( 10.2) Quick-Sort Tree. Partitioning arrays

Sorting is ordering a list of objects. Here are some sorting algorithms

BM267 - Introduction to Data Structures

Intro to Algorithms. Professor Kevin Gold

Module 3: Sorting and Randomized Algorithms

7. Sorting I. 7.1 Simple Sorting. Problem. Algorithm: IsSorted(A) 1 i j n. Simple Sorting

CS S-11 Sorting in Θ(nlgn) 1. Base Case: A list of length 1 or length 0 is already sorted. Recursive Case:

Lecture Notes on Quicksort

Introduction. e.g., the item could be an entire block of information about a student, while the search key might be only the student's name

Divide-and-Conquer. Dr. Yingwu Zhu

Algorithmic Analysis and Sorting, Part Two

Run Times. Efficiency Issues. Run Times cont d. More on O( ) notation

CS1 Lecture 30 Apr. 2, 2018

CS125 : Introduction to Computer Science. Lecture Notes #38 and #39 Quicksort. c 2005, 2003, 2002, 2000 Jason Zych

CS 171: Introduction to Computer Science II. Quicksort

CSC Design and Analysis of Algorithms

CS240 Fall Mike Lam, Professor. Quick Sort

11/8/2016. Chapter 7 Sorting. Introduction. Introduction. Sorting Algorithms. Sorting. Sorting

Sorting and Selection

Lecture Notes on Quicksort

Transcription:

CSE373: Data Structure & Algorithms Lecture 21: More Comparison Sorting Aaron Bauer Winter 2014

The main problem, stated carefully For now, assume we have n comparable elements in an array and we want to rearrange them to be in increasing order Input: An array A of data records A key value in each data record A comparison function (consistent and total) Effect: Reorganize the elements of A such that for any i and j, if i < j then A[i] A[j] (Also, A must have exactly the same data it started with) Could also sort in reverse order, of course An algorithm doing this is a comparison sort 2

Sorting: The Big Picture Surprising amount of neat stuff to say about sorting: Simple algorithms: O(n 2 ) Fancier algorithms: O(n log n) Comparison lower bound: Ω(n log n) Specialized algorithms: O(n) Handling huge data sets Insertion sort Selection sort Shell sort Heap sort Merge sort Quick sort (avg) Bucket sort Radix sort External sorting 3

Mergesort Analysis Having defined an algorithm and argued it is correct, we should analyze its running time and space: To sort n elements, we: Return immediately if n=1 Else do 2 subproblems of size n/2 and then an O(n) merge Recurrence relation: T(1) = c 1 T(n) = 2T(n/2) + c 2 n 4

One of the recurrence classics For simplicity let constants be 1 no effect on asymptotic answer T(1) = 1 T(n) = 2T(n/2) + n So total is 2 k T(n/2 k ) + kn where n/2 k = 1, i.e., log n = k = 2(2T(n/4) + n/2) + n That is, 2 log n T(1) + n log n = 4T(n/4) + 2n = n + n log n = 4(2T(n/8) + n/4) + 2n = O(n log n) = 8T(n/8) + 3n. = 2 k T(n/2 k ) + kn 5

Or more intuitively This recurrence is common you just know it s O(n log n) Merge sort is relatively easy to intuit (best, worst, and average): The recursion tree will have log n height At each level we do a total amount of merging equal to n 6

Quicksort Also uses divide-and-conquer Recursively chop into two pieces Instead of doing all the work as we merge together, we will do all the work as we recursively split into halves Unlike merge sort, does not need auxiliary space O(n log n) on average J, but O(n 2 ) worst-case L Faster than merge sort in practice? Often believed so Does fewer copies and more comparisons, so it depends on the relative cost of these two operations! 7

Quicksort Overview 1. Pick a pivot element 2. Partition all the data into: A. The elements less than the pivot B. The pivot C. The elements greater than the pivot 3. Recursively sort A and C 4. The answer is, as simple as A, B, C (Alas, there are some details lurking in this algorithm) 8

Think in Terms of Sets S 13 81 92 43 65 31 57 26 75 0 select pivot value S 1 S 2 partition S 13 0 43 31 65 26 57 92 75 81 S 1 0 13 26 31 43 57 65 S 2 75 81 92 Quicksort(S 1 ) and Quicksort(S 2 ) S 0 13 26 31 43 57 65 75 81 92 Presto! S is sorted [Weiss] 9

Example, Showing Recursion Divide Divide Divide 1 Element 1 2 8 2 9 4 5 3 1 6 2 4 3 1 5 8 9 6 3 2 1 4 6 8 9 Conquer 1 2 Conquer 1 2 3 4 6 8 9 Conquer 1 2 3 4 5 6 8 9 10

Details Have not yet explained: How to pick the pivot element Any choice is correct: data will end up sorted But as analysis will show, want the two partitions to be about equal in size How to implement partitioning In linear time In place 11

Pivots Best pivot? Median Halve each time 8 2 9 4 5 3 1 6 5 2 4 3 1 8 9 6 Worst pivot? Greatest/least element Problem of size n - 1 O(n 2 ) 8 2 9 4 5 3 1 6 1 8 2 9 4 5 3 6 12

Potential pivot rules While sorting arr from lo (inclusive) to hi (exclusive) Pick arr[lo] or arr[hi-1] Fast, but worst-case occurs with mostly sorted input Pick random element in the range Does as well as any technique, but (pseudo)random number generation can be slow Still probably the most elegant approach Median of 3, e.g., arr[lo], arr[hi-1], arr[(hi+lo)/2] Common heuristic that tends to work well 13

Partitioning Conceptually simple, but hardest part to code up correctly After picking pivot, need to partition in linear time in place One approach (there are slightly fancier ones): 1. Swap pivot with arr[lo] 2. Use two fingers i and j, starting at lo+1 and hi-1 3. while (i < j) if (arr[j] > pivot) j-- else if (arr[i] < pivot) i++ else swap arr[i] with arr[j] 4. Swap pivot with arr[i] * *skip step 4 if pivot ends up being least element 14

Example Step one: pick pivot as median of 3 lo = 0, hi = 10 0 1 2 3 4 5 6 7 8 9 8 1 4 9 0 3 5 2 7 6 Step two: move pivot to the lo position 0 1 2 3 4 5 6 7 8 9 6 1 4 9 0 3 5 2 7 8 15

Example Now partition in place 6 1 4 9 0 3 5 2 7 8 Often have more than one swap during partition this is a short example Move fingers 6 1 4 9 0 3 5 2 7 8 Swap 6 1 4 2 0 3 5 9 7 8 Move fingers 6 1 4 2 0 3 5 9 7 8 Move pivot 5 1 4 2 0 3 6 9 7 8 16

Analysis Best-case: Pivot is always the median T(0)=T(1)=1 T(n)=2T(n/2) + n -- linear-time partition Same recurrence as mergesort: O(n log n) Worst-case: Pivot is always smallest or largest element T(0)=T(1)=1 T(n) = 1T(n-1) + n Basically same recurrence as selection sort: O(n 2 ) Average-case (e.g., with random pivot) O(n log n), not responsible for proof (in text) 17

Cutoffs For small n, all that recursion tends to cost more than doing a quadratic sort Remember asymptotic complexity is for large n Common engineering technique: switch algorithm below a cutoff Reasonable rule of thumb: use insertion sort for n < 10 Notes: Could also use a cutoff for merge sort Cutoffs are also the norm with parallel algorithms Switch to sequential algorithm None of this affects asymptotic complexity 18

Cutoff skeleton void quicksort(int[] arr, int lo, int hi) { if(hi lo < CUTOFF) insertionsort(arr,lo,hi); else } Notice how this cuts out the vast majority of the recursive calls Think of the recursive calls to quicksort as a tree Trims out the bottom layers of the tree 19

Visualizations http://www.cs.usfca.edu/~galles/visualization/algorithms.html 20

How Fast Can We Sort? Heapsort & mergesort have O(n log n) worst-case running time Quicksort has O(n log n) average-case running time These bounds are all tight, actually Θ(n log n) So maybe we need to dream up another algorithm with a lower asymptotic complexity, such as O(n) or O(n log log n) Instead: we know that this is impossible Assuming our comparison model: The only operation an algorithm can perform on data items is a 2-element comparison 21

A General View of Sorting Assume we have n elements to sort For simplicity, assume none are equal (no duplicates) How many permutations of the elements (possible orderings)? Example, n=3 a[0]<a[1]<a[2] a[0]<a[2]<a[1] a[1]<a[0]<a[2] a[1]<a[2]<a[0] a[2]<a[0]<a[1] a[2]<a[1]<a[0] In general, n choices for least element, n-1 for next, n-2 for next, n(n-1)(n-2) (2)(1) = n! possible orderings 22

Counting Comparisons So every sorting algorithm has to find the right answer among the n! possible answers Starts knowing nothing, anything is possible Gains information with each comparison Intuition: Each comparison can at best eliminate half the remaining possibilities Must narrow answer down to a single possibility What we can show: Any sorting algorithm must do at least (1/2)nlog n (1/2)n (which is Ω(n log n)) comparisons Otherwise there are at least two permutations among the n! possible that cannot yet be distinguished, so the algorithm would have to guess and could be wrong [incorrect algorithm] 23

Optional: Counting Comparisons Don t know what the algorithm is, but it cannot make progress without doing comparisons Eventually does a first comparison is a < b?" Can use the result to decide what second comparison to do Etc.: comparison k can be chosen based on first k-1 results Can represent this process as a decision tree Nodes contain set of remaining possibilities Root: None of the n! options yet eliminated Edges are answers from a comparison The algorithm does not actually build the tree; it s what our proof uses to represent the most the algorithm could know so far as the algorithm progresses 24

Optional: One Decision Tree for n=3 a < b < c a < c < b c < a < b a < c a > c a < b < c c < a < b a < c < b b < c b > c a < b < c a < c < b a < b a < b < c, b < c < a, a < c < b, c < a < b, b < a < c, c < b < a a > b b < a < c b < c < a c < b < a b < c b > c b < a < c c < b < a b < c < a c < a c > a b < c < a b < a < c The leaves contain all the possible orderings of a, b, c A different algorithm would lead to a different tree 25

Optional: Example if a < c < b possible orders a < b < c a < c < b c < a < b a < c a > c a < b < c c < a < b a < c < b b < c b > c a < b < c a < c < b a < b a < b < c, b < c < a, a < c < b, c < a < b, b < a < c, c < b < a actual order a > b b < a < c b < c < a c < b < a b < c b > c b < a < c c < b < a b < c < a c < a c > a b < c < a b < a < c 26

Optional: What the Decision Tree Tells Us A binary tree because each comparison has 2 outcomes (We assume no duplicate elements) (Would have 1 outcome if algorithm asks redundant questions) Because any data is possible, any algorithm needs to ask enough questions to produce all n! answers Each answer is a different leaf So the tree must be big enough to have n! leaves Running any algorithm on any input will at best correspond to a root-to-leaf path in some decision tree with n! leaves So no algorithm can have worst-case running time better than the height of a tree with n! leaves Worst-case number-of-comparisons for an algorithm is an input leading to a longest path in algorithm s decision tree 27

Optional: Where are we Proven: No comparison sort can have worst-case running time better than the height of a binary tree with n! leaves A comparison sort could be worse than this height, but it cannot be better Now: a binary tree with n! leaves has height Ω(n log n) Height could be more, but cannot be less Factorial function grows very quickly Conclusion: Comparison sorting is Ω (n log n) An amazing computer-science result: proves all the clever programming in the world cannot comparison-sort in linear time 28

Optional: Height lower bound The height of a binary tree with L leaves is at least log 2 L So the height of our decision tree, h: h log 2 (n!) property of binary trees = log 2 (n*(n-1)*(n-2) (2)(1)) definition of factorial = log 2 n + log 2 (n-1) + + log 2 1 property of logarithms log 2 n + log 2 (n-1) + + log 2 (n/2) drop smaller terms ( 0) log 2 (n/2) + log 2 (n/2) + + log 2 (n/2) shrink terms to log 2 (n/2) = (n/2)log 2 (n/2) arithmetic = (n/2)(log 2 n - log 2 2) property of logarithms = (1/2)nlog 2 n (1/2)n arithmetic = Ω (n log n) 29