Sorting and Selection

Similar documents
Merge Sort

COMP 352 FALL Tutorial 10

Quick-Sort fi fi fi 7 9. Quick-Sort Goodrich, Tamassia

DIVIDE AND CONQUER ALGORITHMS ANALYSIS WITH RECURRENCE EQUATIONS

Chapter 4: Sorting. Spring 2014 Sorting Fun 1

COMP Data Structures

Sorting Goodrich, Tamassia Sorting 1

Quick-Sort. Quick-Sort 1

SORTING, SETS, AND SELECTION

CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics

Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015

Programming II (CS300)

Sorting. Outline. Sorting with a priority queue Selection-sort Insertion-sort Heap Sort Quick Sort

Sorting. Divide-and-Conquer 1

Recursive Sorts. Recursive Sorts. Divide-and-Conquer. Divide-and-Conquer. Divide-and-conquer paradigm:

SORTING AND SELECTION

Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, Merge Sort & Quick Sort

Sorting. Data structures and Algorithms

Lecture 19 Sorting Goodrich, Tamassia

Outline and Reading. Quick-Sort. Partition. Quick-Sort. Quick-Sort Tree. Execution Example

SORTING LOWER BOUND & BUCKET-SORT AND RADIX-SORT

Quick-Sort. Quick-sort is a randomized sorting algorithm based on the divide-and-conquer paradigm:

Analysis of Algorithms. Unit 4 - Analysis of well known Algorithms

Sorting (I) Hwansoo Han

Hiroki Yasuga, Elisabeth Kolp, Andreas Lang. 25th September 2014, Scientific Programming

Data Structures and Algorithms " Sorting!

Merge Sort fi fi fi 4 9. Merge Sort Goodrich, Tamassia

SAMPLE OF THE STUDY MATERIAL PART OF CHAPTER 6. Sorting Algorithms

QuickSort

Comparison Sorts. Chapter 9.4, 12.1, 12.2

1. The Sets ADT. 1. The Sets ADT. 1. The Sets ADT 11/22/2011. Class Assignment Set in STL

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

Sorting. Bubble Sort. Pseudo Code for Bubble Sorting: Sorting is ordering a list of elements.

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1

Parallel Sorting Algorithms

O(n): printing a list of n items to the screen, looking at each item once.

COT 5407: Introduction. to Algorithms. Giri NARASIMHAN. 1/29/19 CAP 5510 / CGS 5166

CS61BL. Lecture 5: Graphs Sorting

Sorting Algorithms. + Analysis of the Sorting Algorithms

II (Sorting and) Order Statistics

Divide and Conquer Algorithms: Advanced Sorting. Prichard Ch. 10.2: Advanced Sorting Algorithms

Unit 6 Chapter 15 EXAMPLES OF COMPLEXITY CALCULATION

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1

CS 5321: Advanced Algorithms Sorting. Acknowledgement. Ali Ebnenasir Department of Computer Science Michigan Technological University

Problem. Input: An array A = (A[1],..., A[n]) with length n. Output: a permutation A of A, that is sorted: A [i] A [j] for all. 1 i j n.

having any value between and. For array element, the plot will have a dot at the intersection of and, subject to scaling constraints.

Merge Sort Goodrich, Tamassia Merge Sort 1

Divide and Conquer Sorting Algorithms and Noncomparison-based

Week 10. Sorting. 1 Binary heaps. 2 Heapification. 3 Building a heap 4 HEAP-SORT. 5 Priority queues 6 QUICK-SORT. 7 Analysing QUICK-SORT.

Algorithms and Applications

Lower bound for comparison-based sorting

We can use a max-heap to sort data.

Programming II (CS300)

DIVIDE & CONQUER. Problem of size n. Solution to sub problem 1

Week 10. Sorting. 1 Binary heaps. 2 Heapification. 3 Building a heap 4 HEAP-SORT. 5 Priority queues 6 QUICK-SORT. 7 Analysing QUICK-SORT.

Searching, Sorting. part 1

HEAPS: IMPLEMENTING EFFICIENT PRIORITY QUEUES

Priority queues. Priority queues. Priority queue operations

Algorithms and Data Structures (INF1) Lecture 7/15 Hua Lu

How many leaves on the decision tree? There are n! leaves, because every permutation appears at least once.

Fast Sorting and Selection. A Lower Bound for Worst Case

Sorting (Chapter 9) Alexandre David B2-206

BM267 - Introduction to Data Structures

CS240 Fall Mike Lam, Professor. Quick Sort

Sorting (Chapter 9) Alexandre David B2-206

Selection (deterministic & randomized): finding the median in linear time

CS 310 Advanced Data Structures and Algorithms

Copyright 2012 by Pearson Education, Inc. All Rights Reserved.

Programming II (CS300)

Deliverables. Quick Sort. Randomized Quick Sort. Median Order statistics. Heap Sort. External Merge Sort

Data Structures and Algorithms Chapter 4

Programming II (CS300)

CPSC 311 Lecture Notes. Sorting and Order Statistics (Chapters 6-9)

Sorting. Sorting in Arrays. SelectionSort. SelectionSort. Binary search works great, but how do we create a sorted array in the first place?

COMP Analysis of Algorithms & Data Structures

Lecture 9: Sorting Algorithms

Selection, Bubble, Insertion, Merge, Heap, Quick Bucket, Radix

Chapter 7 Sorting. Terminology. Selection Sort

Sorting and Searching

7. Sorting I. 7.1 Simple Sorting. Problem. Algorithm: IsSorted(A) 1 i j n. Simple Sorting

Heaps Outline and Required Reading: Heaps ( 7.3) COSC 2011, Fall 2003, Section A Instructor: N. Vlajic

Priority Queues and Binary Heaps

Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015

Unit-2 Divide and conquer 2016

Binary heaps (chapters ) Leftist heaps

Sorting. Riley Porter. CSE373: Data Structures & Algorithms 1

Algorithms and Data Structures. Marcin Sydow. Introduction. QuickSort. Sorting 2. Partition. Limit. CountSort. RadixSort. Summary

CS221: Algorithms and Data Structures. Sorting Takes Priority. Steve Wolfman (minor tweaks by Alan Hu)

Sorting and Searching

Divide and Conquer Algorithms: Advanced Sorting

EECS 2011M: Fundamentals of Data Structures

Priority Queues and Heaps. Heaps and Priority Queues 1

Sorting. Lecture10: Sorting II. Sorting Algorithms. Performance of Sorting Algorithms

Balanced Binary Search Trees. Victor Gao

Jana Kosecka. Linear Time Sorting, Median, Order Statistics. Many slides here are based on E. Demaine, D. Luebke slides

4. Sorting and Order-Statistics

Data Structures and Algorithms Week 4

CSE 373 NOVEMBER 8 TH COMPARISON SORTS

(Refer Slide Time: 01.26)

Sorting Pearson Education, Inc. All rights reserved.

Transcription:

Sorting and Selection Introduction Divide and Conquer Merge-Sort Quick-Sort Radix-Sort Bucket-Sort 10-1 Introduction Assuming we have a sequence S storing a list of keyelement entries. The key of the element stored at rank i is key(i). Sorting is a process that rearrange the elements in S so that: if i j then key(i) key(j) according to the total order relation associated with the key. 10-2

Divide and Conquer Divide and Conquer is a designer pattern that allows us to solve larger problems by decomposing them into smaller and manageable sub-problems. Steps for divide and conquer: if the problem size is small enough, solve it using a straight forward algorithm divide the problem into two or more smaller sub-problems recursively solve the sub-problems combine the results of the sub-problems to obtain the result of the original problem. 10-3 Sorting Algorithms Review Sorting Algorithm Average Performance Worst Case Performance Remarks Bubble Sort O (n 2 ) O (n 2 ) Insertion Sort O (n 2 ) O (n 2 ) Simple but slow Selection Sort O (n 2 ) O (n 2 ) Heap Sort O (n log n) O (n log n) Fast but complicated 10-4

Merge Sort An efficient sorting algorithm based on divide and conquer. Algorithm: Divide: If S has at leas two elements (nothing needs to be done if S has zero or one elements), remove all the elements from S and put them into two sequences, S 1 and S 2, each containing about half of the elements of S. (e.g. S 1 contains the first n/2 elements and S 2 contains the remaining n/2 elements.) Recur: Recursively sort sequences S 1 and S 2. Conquer: Put back the elements into S by merging the sorted sequences S 1 and S 2 into a unique sorted sequence. 10-5 Merging Two Sequences But how can we merge two sorted sequences efficiently? We can use the pseudo code shown in the next slides. 10-6

Merging Two Sequences Algorithm merge (S 1, S 2, S): Input: Sequence S 1 and S 2 (on whose elements a total order relation is defined) sorted in non-decreasing order, and an empty sequence S. Output: Sequence S containing the union of the elements from S 1 and S 2 sorted in non-decreasing order; sequence S 1 and S 2 become empty at the end of the execution. 10-7 Merging Two Sequences while S 1 is not empty and S 2 is not empty do if S 1.first ().element () S 2.first ().element () then {move the first element of S 1 to the end of S} S.insertLast (S 1.remove (S 1.first ())) else {move the first element of S 2 to the end of S} S.insertLast (S 2.remove (S 2.first ())) {move the remaining elements of S 1 to S} while S 1 is not empty do S.insertLast (S 1.remove (S 1.first ())) {move the remaining elements of S 2 to S} while S 2 is not empty do S.insertLast (S 2.remove (S 2.first ())) 10-8

Merging Two Sequences 10-9 Merging Two Sequences 10-10

Merging Two Sequences 10-11 Merging Two Sequences 10-12

Merging Two Sequences 10-13 Merging Two Sequences 10-14

Merging Two Sequences 10-15 Merging Two Sequences 10-16

Merging Two Sequences 10-17 Analysis Proposition 1: The merge-sort tree (see text book for details about the merge-sort tree) associated with the execution of a merge-sort on a sequence of n elements has a height of logn Proposition 2: A merge sort algorithm sorts a sequence of size n in O(n log n) time The only assumption we have made is that the input sequence S and each of the sub-sequences created by the recursive calls of the algorithm can access, insert to, and delete from the first and last nodes in O(1) time. 10-18

Analysis We call the time spent at node v of merge-sort tree T the running time of the recursive call associated with v, excluding the recursive calls sent to v s children. If we let i represent the depth of node v in the merge-sort tree, the time spent at node v is O(n/2 i ) since the size of the sequence associated with v is n/2 i. Observe that T has exactly 2 i nodes at depth i. The total time spent at depth i in the tree is then O(2 i n/2 i ), which is O(n). We know the tree has height log n Therefore, the time complexity is O(n log n) 10-19 Quick-Sort A simple sorting algorithm also based on divide and conquer. Steps for divide and conquer: Divide : If the sequence S has 2 or more elements, select an element x from S to be your pivot. Any arbitrary element, like the last, will do. Remove all the elements of S and divide them into 3 sequences: L, holds S s elements less than x E, holds S s elements equal to x G, holds S s elements greater than x Recurse: Recursively sort L and G Conquer: Finally, to put elements back into S in order, first inserts the elements of L, then those of E, and those of G. 10-20

Example Select - pick an element 10-21 Example Divide - rearrange elements so that x goes to its final position E 10-22

Example Recurse and Conquer - recursively sort 10-23 In-Place Quick-Sort Divide step: l scans the sequence from the left, and r from the right. 10-24

In-Place Quick-Sort A swap is performed when l is at an element larger than the pivot and r is at one smaller than the pivot. 10-25 In-Place Quick-Sort 10-26

In-Place Quick-Sort A final swap with the pivot completes the divide step 10-27 Analysis Consider a quick-sort tree T: Let s i (n) denote the sum of the input sizes of the nodes at depth i in T. We know that s 0 (n) = n since the root of T is associated with the entire input set. Also, s 1 (n) = n - 1 since the pivot is not propagated. Thus: either s 2 (n)= n - 3, or n - 2 (if one of the nodes has a zero input size). The worst case running time of a quick-sort is then: Which reduces to: O i= 0 i= 1 Thus quick-sort runs in time O(n 2 ) in the worst case. n 1 n 2 ( n i) = O i = O( n ) n 1 O s i ( n) i= 0 10-28

Analysis Now to look at the best case running time: We can see that quicksort behaves optimally if, whenever a sequence S is divided into subsequences L and G, they are of equal size. More precisely: s 0 (n) = n s 1 (n) = n - 1 s 2 (n) = n - (1 + 2) = n - 3 s 3 (n) = n - (1 + 2 + 2 2 ) = n - 7 s i (n) = n - (1 + 2 + 2 2 +... + 2 i-1 ) = n - 2 i + 1... This implies that T has height O(log n) Best Case Time Complexity: O(n log n) 10-29 Randomized Quick-Sort Select the pivot as a random element of the sequence The expected running time of randomized quick-sort on a sequence of size n is O(nlogn) The time spent at a level of the quick-sort tree is O(n) We show that the expected height of the quick-sort tree is O(logn) 10-30

Randomized Quick-Sort good vs. bad pivots good: 1/4 n L /n 3/4 bad: n L /n < 1/4 or n L /n > 3/4 the probability of a good pivot is 1/2, thus we expect k/2 good pivots out of k pivots after a good pivot the size of each child sequence is at most 3/4 the size of the parent sequence After h pivots, we expect (3/4) h/2 n elements the expected height h of the quick-sort tree is at most: 2 log 4/3 n 10-31 Decision Tree For Comparison-Based Sorting 10-32

How Fast Can We Sort? Proposition: The worst case running time of any comparison-based algorithm for sorting an n-element sequence S is Θ(n log n). Justification: The running time of a comparison-based sorting algorithm must be equal to or greater than the depth of the decision tree T associated with this algorithm. Each internal node of T is associated with a comparison that establishes the ordering of two elements of S. Each external node of T represents a distinct permutation of the elements of S. Hence T must have at least n! external nodes which implies T has a height of at least log(n!) Since n! has at least n/2 terms that are greater than or equal to n/2, we have: log(n!) (n/2) log(n/2). So the total time complexity: Θ(n log n). 10-33 Can We Sort Faster Than O(n log n)? As we can see in the previous slides, O(n log n) is the best we can do in comparison-based sorting. How about non-comparison-based sorting? Can we sort faster than O(n log n) using non-comparisonbased sorting? The answer to this question is yes. 10-34

Radix-Sort Unlike other sorting methods, radix sort considers the structure of the keys Assuming keys are represented in a base M number system (M is the radix), i.e., if M = 2, the keys are represented in binary Sorting is done by comparing bits in the same position Extension to keys that are alphanumeric strings 10-35 Radix Exchange Sort We examine bits from left to right First sort the array with respect to the leftmost bit: 10-36

Radix Exchange Sort Then we partition the array into 2 arrays: 10-37 Radix Exchange Sort Finally, we recursively sort top sub-array, ignoring leftmost bit(s) recursively sort bottom sub-array, ignoring leftmost bit(s) Time to sort n b-bit numbers: O(bn) 10-38

Radix Exchange Sort How do we do the sort from the previous page? Same idea as partition in Quicksort: repeat scan top-down to find key starting with 1; scan bottom-up to find key starting with 0; exchange keys; until scan indices cross; 10-39 Radix Exchange Sort 10-40

Radix Exchange Sort 10-41 Radix Exchange Sort vs. Quick Sort Similarities both partition array both recursively sort sub-arrays Differences Method of partitioning radix exchange divides array based on greater than or less than 2 b-1 quick sort partitions based on greater than or less than some element of the array Time complexity Radix exchange: O(bn) Quick sort average case: O(n log n) 10-42

Straight Radix Sort Examines bits from right to left: for k 0 to b-1 do sort the array in a stable way, looking only at bit k 10-43 Stable Sorting In a stable sort, the initial relative order of equal keys is unchanged. For example, observe the first step of the sort from the previous page: Note that the relative order of those keys ending with 0 is unchanged, and the same is true for elements ending in 1 10-44

Stable Sorting We show that any two keys are in the correct relative order at the end of the algorithm Given two keys, let k be the leftmost bit-position where they differ At step k the two keys are put in the correct relative order Because of stability, the successive steps do not change the relative order of the two keys 10-45 Example Consider sorting on an array with these two keys It makes no difference what order they are in when the sort begins. When the sort visits bit k, the keys are put in the correct relative order. Because the sort is stable, the order of the two keys will not be changed when bits > k are compared. 10-46

Radix Sort on Decimal Numbers 10-47 Straight Radix Sort on Decimal Numbers for k 0 to b - 1 do sort the array in a stable way, looking only at digit k Suppose we can perform the stable sort above in O(n) time. The total time complexity would be O(bn) As you might have guessed, we can perform a stable sort based on the keys k th digit in O(n) time. The method? Bucket Sort. 10-48

Bucket Sort n numbers Each number {1, 2, 3,... m} Stable Time: O(n + m) For example, m = 3 and our array is: Note that there are two 2 s and two 1 s First, we create M buckets 10-49 Example 10-50

Example 10-51 Example Now, pull the elements from the buckets into the array At last, the sorted array (sorted in a stable way): 10-52

Sorting Algorithms Summary Sorting Algorithm Average Performance Worst Case Performance Remarks Bubble Sort O (n 2 ) O (n 2 ) Insertion Sort O (n 2 ) O (n 2 ) Simple but slow Selection Sort O (n 2 ) O (n 2 ) Heap Sort O (n log n) O (n log n) Fast but complicated Merge Sort O (n log n) O (n log n) Fast but still relatively complicated Quick Sort O (n log n) O (n 2 ) Integer Sort O (n) O (n) Fast and simple, but poor performance in worst case Fast and simple, but only applicable to integer keys 10-53 Selection Finding the minimum or maximum element from an unsorted sequence takes O(n). This problem can be generalized into finding the k th minimum element from an unsorted sequence. We can first sort the sequence and then return the element stored at rank k-1. This will take O(n log n) due to sorting. But we can do better 10-54

Prune and Search Also call decrease-and-conquer. A design pattern that is also used in binary search. We find the solution by pruning away a fraction of the objects in the original problem and solve it recursively. The prune-and-search algorithm that we will discuss is call randomized quick selection. Not surprisingly, randomized quick selection is very similar to randomized quick sort. 10-55 Randomized Quick Selection Algorithm quickselect (S, k): Input: Unsorted sequences S containing n comparable elements, and an integer k {1,n} Output: The k th smallest element of S if n=1 then return the (first) element of S. pick a random element x of S remove all element from S and put them into 3 sequences: - L, storing the element in S less than x - E, storing the element in S equal to x - G, storing the element in S greater than x. if k L then return quickselect (L, k) else if k L + E then return x else return quickselect (G, k- L - E ) Performance: Worst case: O(n 2 ) Expected: O(n) Every element in E is equal to x Note the new selection parameter. 10-56