CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics

Similar documents
CPSC 311 Lecture Notes. Sorting and Order Statistics (Chapters 6-9)

Lecture 5: Sorting Part A

Problem. Input: An array A = (A[1],..., A[n]) with length n. Output: a permutation A of A, that is sorted: A [i] A [j] for all. 1 i j n.

4. Sorting and Order-Statistics

Data Structures and Algorithms Week 4

COMP Data Structures

Data Structures and Algorithms Chapter 4

Deliverables. Quick Sort. Randomized Quick Sort. Median Order statistics. Heap Sort. External Merge Sort

Sorting Shabsi Walfish NYU - Fundamental Algorithms Summer 2006

7. Sorting I. 7.1 Simple Sorting. Problem. Algorithm: IsSorted(A) 1 i j n. Simple Sorting

Sorting. Bubble Sort. Pseudo Code for Bubble Sorting: Sorting is ordering a list of elements.

Data Structures and Algorithms. Roberto Sebastiani

Outline. CS 561, Lecture 6. Priority Queues. Applications of Priority Queue. For NASA, space is still a high priority, Dan Quayle

CSE 3101: Introduction to the Design and Analysis of Algorithms. Office hours: Wed 4-6 pm (CSEB 3043), or by appointment.

Sorting and Selection

quiz heapsort intuition overview Is an algorithm with a worst-case time complexity in O(n) data structures and algorithms lecture 3

CS 310 Advanced Data Structures and Algorithms

Chapter 6 Heap and Its Application

Next. 1. Covered basics of a simple design technique (Divideand-conquer) 2. Next, more sorting algorithms.

The divide-and-conquer paradigm involves three steps at each level of the recursion: Divide the problem into a number of subproblems.

1. Covered basics of a simple design technique (Divideand-conquer) 2. Next, more sorting algorithms.

COMP2012H Spring 2014 Dekai Wu. Sorting. (more on sorting algorithms: mergesort, quicksort, heapsort)

Total Points: 60. Duration: 1hr

II (Sorting and) Order Statistics

Heaps and Priority Queues

QuickSort and Others. Data Structures and Algorithms Andrei Bulatov

Lecture 9: Sorting Algorithms

EECS 2011M: Fundamentals of Data Structures

How many leaves on the decision tree? There are n! leaves, because every permutation appears at least once.

Lecture 2: Getting Started

08 A: Sorting III. CS1102S: Data Structures and Algorithms. Martin Henz. March 10, Generated on Tuesday 9 th March, 2010, 09:58

Comparison Based Sorting Algorithms. Algorithms and Data Structures: Lower Bounds for Sorting. Comparison Based Sorting Algorithms

Module 3: Sorting and Randomized Algorithms

CSci 231 Final Review

Module 3: Sorting and Randomized Algorithms. Selection vs. Sorting. Crucial Subroutines. CS Data Structures and Data Management

Lecture: Analysis of Algorithms (CS )

CS 561, Lecture 1. Jared Saia University of New Mexico

Algorithms and Data Structures: Lower Bounds for Sorting. ADS: lect 7 slide 1

Examination Questions Midterm 2

Jana Kosecka. Linear Time Sorting, Median, Order Statistics. Many slides here are based on E. Demaine, D. Luebke slides

8. Sorting II. 8.1 Heapsort. Heapsort. [Max-]Heap 6. Heapsort, Quicksort, Mergesort. Binary tree with the following properties Wurzel

Properties of a heap (represented by an array A)

BM267 - Introduction to Data Structures

CISC 621 Algorithms, Midterm exam March 24, 2016

Partha Sarathi Manal

CSE373: Data Structure & Algorithms Lecture 21: More Comparison Sorting. Aaron Bauer Winter 2014

Algorithms, Spring 2014, CSE, OSU Lecture 2: Sorting

Tirgul 4. Order statistics. Minimum & Maximum. Order Statistics. Heaps. minimum/maximum Selection. Overview Heapify Build-Heap

COMP Analysis of Algorithms & Data Structures

Sorting. Two types of sort internal - all done in memory external - secondary storage may be used

The divide and conquer strategy has three basic parts. For a given problem of size n,

Module 3: Sorting and Randomized Algorithms

University of Waterloo CS240, Winter 2010 Assignment 2

Comparisons. Θ(n 2 ) Θ(n) Sorting Revisited. So far we talked about two algorithms to sort an array of numbers. What is the advantage of merge sort?

Searching, Sorting. part 1

Randomized Algorithms, Quicksort and Randomized Selection

CS61BL. Lecture 5: Graphs Sorting

Sorting Algorithms. For special input, O(n) sorting is possible. Between O(n 2 ) and O(nlogn) E.g., input integer between O(n) and O(n)

Comparisons. Heaps. Heaps. Heaps. Sorting Revisited. Heaps. So far we talked about two algorithms to sort an array of numbers

Computer Science & Engineering 423/823 Design and Analysis of Algorithms

Sorting. Popular algorithms: Many algorithms for sorting in parallel also exist.

CS 361 Data Structures & Algs Lecture 9. Prof. Tom Hayes University of New Mexico

Midterm solutions. n f 3 (n) = 3

O(n): printing a list of n items to the screen, looking at each item once.

Comparison Sorts. Chapter 9.4, 12.1, 12.2

CS 303 Design and Analysis of Algorithms

Heaps. Heapsort. [Reading: CLRS 6] Laura Toma, csci2000, Bowdoin College

Algorithms Lab 3. (a) What is the minimum number of elements in the heap, as a function of h?

Unit Outline. Comparing Sorting Algorithms Heapsort Mergesort Quicksort More Comparisons Complexity of Sorting 2 / 33

CS 5321: Advanced Algorithms Sorting. Acknowledgement. Ali Ebnenasir Department of Computer Science Michigan Technological University

Outline. Computer Science 331. Heap Shape. Binary Heaps. Heap Sort. Insertion Deletion. Mike Jacobson. HeapSort Priority Queues.

SAMPLE OF THE STUDY MATERIAL PART OF CHAPTER 6. Sorting Algorithms

Multiple-choice (35 pt.)

Introduction to Algorithms

CS 3343 (Spring 2018) Assignment 4 (105 points + 15 extra) Due: March 9 before class starts

SORTING AND SELECTION

CS S-11 Sorting in Θ(nlgn) 1. Base Case: A list of length 1 or length 0 is already sorted. Recursive Case:

A data structure and associated algorithms, NOT GARBAGE COLLECTION

Topic: Heaps and priority queues

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

Heapsort. Heap data structure

Heaps and Priority Queues

COMP Analysis of Algorithms & Data Structures

Week 10. Sorting. 1 Binary heaps. 2 Heapification. 3 Building a heap 4 HEAP-SORT. 5 Priority queues 6 QUICK-SORT. 7 Analysing QUICK-SORT.

CSE 241 Class 17. Jeremy Buhler. October 28, Ordered collections supported both, plus total ordering operations (pred and succ)

Module 2: Classical Algorithm Design Techniques

Lecture 19 Sorting Goodrich, Tamassia

Question 7.11 Show how heapsort processes the input:

data structures and algorithms lecture 4

overview use of max-heaps maxheapify: recall pseudo code heaps heapsort data structures and algorithms lecture 4 heap sort: more

21# 33# 90# 91# 34# # 39# # # 31# 98# 0# 1# 2# 3# 4# 5# 6# 7# 8# 9# 10# #

DIVIDE AND CONQUER ALGORITHMS ANALYSIS WITH RECURRENCE EQUATIONS

Data Structures and Algorithms

CS350: Data Structures Heaps and Priority Queues

Heaps. Heaps. A heap is a complete binary tree.

CS2 Algorithms and Data Structures Note 6

Unit 6 Chapter 15 EXAMPLES OF COMPLEXITY CALCULATION

3. Priority Queues. ADT Stack : LIFO. ADT Queue : FIFO. ADT Priority Queue : pick the element with the lowest (or highest) priority.

We can use a max-heap to sort data.

Week 10. Sorting. 1 Binary heaps. 2 Heapification. 3 Building a heap 4 HEAP-SORT. 5 Priority queues 6 QUICK-SORT. 7 Analysing QUICK-SORT.

Transcription:

CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics 1 Sorting 1.1 Problem Statement You are given a sequence of n numbers < a 1, a 2,..., a n >. You need to output a re-ordering (permutation) of the input sequence as < a 1, a 2,..., a n >, such that a 1 a 2... a n. 1.2 Insertion Sort Idea: Just like we sort a set of cards in hand start with 0 cards in hand, take a card and put it in the right position; now take the next card and insert it in its right position, continue till all the cards are in their correct position. Algorithm: insertionsort (A) Input: A - the sequence of numbers < a 1, a 2,..., a n > to be sorted Output: no output, but A will be sorted. for j = 2..n { // insert A[j] in correct order in A[1, 2,..., j] key = A[j] i = j 1 while i > 0 and A[i] > key { A[i + 1] = A[i] i = i 1 A[i + 1] = key Proof of Correctness: We will use the following loop invariant property before the start of the j = kth for loop the array A[1, 2,..., k 1] is sorted. We will prove the loop invariant property by induction. 1

Base case: The loop invariant is true before the start of the first loop, when k = 2. Now the array consists of only A[1], which is sorted. Induction Hypothesis: Let the loop invariant be true for j = k 1. We will show that the loop invariant is true for j = k. Induction Step: As the loop invariant is true for j = k 1, before the k 1th for loop, A[1, 2,..., k 2] is sorted. In the k 1th for loop, we take A[k 1], and insert in in the right position. Therefore before the start of the j = kth for loop, A[1, 2,..., k 1] is sorted. Overall proof: Therefore before the start of the j = nth for loop, A[1, 2,..., n 1] is sorted. During loop j = n, A[n] is inserted into the correct position, and therefore at the end of this loop A[1, 2,..., n] is sorted. Analysis Best-case: When the given array is already sorted, each time we take a new element, we need to compare only once. This is done (n 1) times and hence the best case complexity is (n 1) comparisons. Worst-Case: When the given array is in reverse-sorted order, each time we take a new element, we need to compare it with all the existing elements. The worst-case complexity is therefore given by 1 + 2 +...+ (n 1) = (n 1)n/2. As we are interested in worst-case complexity, we say that insertion sort complexity is O(n 2 ). 1.3 Merge Sort Idea: Use Divide and Conquer. Divide the given sequence into two equal subsequences, sort each of them individually (conquer), and merge the two sorted subsequences (combine). Algorithm: mergesort (A, p, r) Input: A - the sequence of numbers A[1, 2,..., n], p and r, such that we 2

need to sort the elements A[p, p + 1,..., r] Output: empty, but the elements A[p, p + 1,..., r] are sorted. if p < r then { q = (p + r)/2 mergesort (A, p, q) mergesort (A, q + 1, r) merge (A, p, q, r) The merge procedure takes in an array A and three numbers p, q, r such that A[p, p + 1,..., q] and A[q + 1, q + 2,..., r are sorted, and returns the array A such that A[p, p + 1,..., r] is sorted. merge (A, p, q, r) Input: A - the sequence of numbers A[1, 2,..., n], p, q and r such that A[p, p + 1,..., q] and A[q + 1, q + 2,..., r are sorted Output: empty, but A[p, p + 1,..., r] is sorted Create two arrays L and R such that L consists of A[p, p + 1,..., q] and R consists of A[q + 1, q + 2,..., r. Add a very large number say as the last element of L and R i = 1; j = 1; // i and j are used to iterate through L and R respectively. for k = p, p + 1,..., r { if L[i] R[j] then { A[k] = L[i]; i = i + 1; else { A[k] = R[j]; j = j + 1; Proof of Correctness of merge operation: We need to show that merge takes two sorted sequences and combines them into one sorted sequence. Note that each step, merge takes the smallest number remaining and puts it in the correct position. Proof of Correctness of mergesort: Use induction on the number of elements in the array. Base step: Check that mergesort works correctly when the number of ele- 3

ments in the array is 1, 2. Induction Hypothesis: Assume that mergesort works correctly when the number of elements in the array is (n 1). Induction Step: We need to show that mergesort works correctly when the number of elements in the array is n. mergesort splits into two sets, which are both < n, and does mergesort on them (which means the two arrays are sorted). Then merge merges the two lists correctly. Hence mergesort works correctly when number of elements in the array is n. Analysis: See that T (n) = 2T (n/2) + Θ(n). Solving this, we get complexity of mergesort is Θ(nlog n). 1.4 Quick Sort Idea: It also uses divide and conquer. Choose an element in the array p as pivot, partition the given array into two sub-arrays L which has all elements p, and R which has all elements > p. Sort L and R using quick sort. and the result is L sorted, followed by p, followed by R sorted. quicksort (A, p, r) Input: A - the sequence of numbers A[1, 2,..., n], p and r, such that we need to sort the elements A[p, p + 1,..., r] Output: empty, but A[p, p + 1,..., r] is sorted. if p < r then { q = partition (A, p, r) // partition returns q where A[q] is the pivot used to parition and A[q] is in the correct position quicksort (A, p, q 1) quicksort (A, q + 1, r) partition(a, p, r) Input: A - the sequence of numbers A[1, 2,..., n], p and r, such that A[p, p + 1,..., r] is partitioned by A[q] such that all elements in A[p, p + 1,..., q 1] are A[q], and all elements in A[q + 1, q + 2,..., r] are > A[q]. Output: q such that all elements in A[p, p + 1,..., q 1] are A[q], and 4

all elements in A[q + 1, q + 2,..., r] are > A[q]. x = A[r] // We choose A[r] as the pivot i = p // i denotes the current index where we will place a number x for j = p..r 1 { if A[j] x then { swap A[j]andA[i];i = i + 1; swap A[i] and A[r]; return i; Proof of Correctness: We can show by induction that quicksort sorts correctly. Analysis: T (n) = T (n 1 ) + T (n n 1 1) + n where n 1 is the size of the first partition. We can see that the worst case complexity of quicksort is when the pivot always partitions into two arrays such that the size of one of them is (n 1) and the size of the other is 0 (when the pivot is the largest element). The best case complexity is when the pivot partitions the array equally. From the above, we say the complexity of quicksort is O(n 2 ) and Ω(nlog n). 2 HeapSort Rooted Tree: A rooted tree has a node designated as root. A node can have 0 or more children. Every node except root has exactly 1 parent, root has 0 parents. From the root, you can reach every node in the tree. Depth of a node depth of root is 0, depth of a non-root node is depth of its parent + 1. Height of the tree is the maximum depth over all nodes. Binary Tree: A rooted tree where a node can have maximum of 2 children. We distinguish between left child and right child of a node. It is possible for a node to have a right child when it has no left child. (Binary) Heap: A Binary tree which has no empty space, and which satisfies the heap property. All the levels except the leaf level are completely filled, the leaf level is filled from left to a point and the rest are empty. 5

Heap property: A max-heap has the property that the value of a node is the value for its children. A min-heap has the property that the value of a node is the value for its children. The maximum element of a max-heap is at its root, the minimum element of a min-heap is at its root. Heaps are used for priority queues consider a scheduler that wants to schedule the maximum priority job. The operations that need to be supported are extractmax to extract the job with the highest priority, insert to insert a new job, heapify to maintain the heap after the highest priority job is removed. All these can be done in O(logn). We will also look at how heaps can be used to sort in O(nlog n), and how to build a heap from a given set of numbers in Θ(n). A heap of n elements can be stored as an array such that A[1] is the root. A[2i] and A[2i + 1] are the children of A[i], and A[ i/2 ] is the parent of A[i]. We will look at different heap operations and how they perform one by one. The first one is maxheapify, which takes in a binary tree, where the left child of the root and the right child of the root are max heaps, and returns a complete max heap. (Note: We actually require the following properties as well. Let h l denote the height of the left child tree and h r denote the height of the right child tree. We require that h l is either = h r or = h r + 1. If h l = h r, we require the left binary tree to be complete (every non-leaf node has exactly 2 children). If h l = h r + 1, we require that the right binary tree to be complete.) maxheapify (A, i) Input: A the input array, and i we want the tree rooted at A[i] to be a heap. In the input, we require that the trees rooted at A[2i] and at A[2i + 1] are both heaps. Output: empty, but the tree rooted at A[i] is a heap. if ((2 i n) and A[2 i] > A[i]) then largest = 2 i; else largest = i if ((2 i + 1 n) and A[2 i + 1] > A[largest]) then largest = 2 i + 1; if (largest i) then { swap A[i] & A[largest]; maxheapify (A, largest); Analysis: Complexity is given by O(log n) (note that in the best case, the 6

complexity could be a constant if A[i] was already a heap). The following algorithm builds a max heap given an array A[1, 2,..., n] of numbers. buildmaxheap (A) Input: A, the array A[1, 2,..., n] Output: empty, but A will be a max-heap for i = n/2..1 maxheapify (A, i) Proof of Correctness: Note that the elements A[ n/2 + 1,..., n] are leaves and hence are already max heaps. The above algorithm at any step takes a non-leaf node such that its children are already max-heaps and builds a max heap using maxheapify. Hence correct. Analysis: As complexity of maxheapify is O(log n), we can obtain that the complexity of buildmaxheap is O(nlog n). But we can do a better analysis. Observation: The complexity of maxheapify given a node is the height of the tree rooted at the node. At lower levels of the tree, there are more nodes and the heights of these trees are smaller. Consider a node at a height of k in the tree. The complexity of maxheapify for such a node is given by k. The number of nodes at a height of k is given by 2 h k = 2 h /2 k. Therefore the complexity is given by h h k 2 h /2 k = 2 h. k/2 k 2 h. k/2 k 2 h.(1/2+2/2 2 +3/2 3 +4/2 4 )+2 h. 2 k/2 /2 k k=1 k=1 k=1 k=1 As 2 h = Θ(n), the above analysis gives the overall complexity of build- MaxHeap is Θ(n). Let us now examine how we can sort using heaps. We will give the intuition only. First build the heap. Now take the maximum element from 7

the heap and this is the largest element in A. Now take the last element in the heap and put it as the root, and call maxheapify (A, 1). This will give a new heap with one less element. Now the second highest element in A will be the root of the new heap. Repeat this till we are left with only one element in A. Analysis: buildheap has complexity Θ(n). We call maxheapify n times, and the complexity of each maxheapify is O(log n). Therefore the overall complexity is O(nlog n). Note that in best case heap sort has complexity Ω(n). Let us consider how to insert an element into the heap. Again, we will give the intuition only. Insert it into the first free position in A, say n + 1. Check if this value is greater than its parent, if yes, swap. Continue this till we reach the root of the heap. The complexity of insert is given by O(log n). 3 Linear Time Sort Note that the above sort algorithms use comparisons, where they compare 2 numbers to decide which one is smaller than the other. We can actually show that for any comparison sort algorithm, the worst case complexity is Ω(nlog n). However, if we make some assumptions about the input, we can sort in less time. An example is known as counting sort, where we assume that the input is in the range [0, k] where k = O(n). In this case, we keep a bucket for each value in the range [0, k] (which is k + 1 buckets). We scan the input, and each time we increment the counter for the corresponding bucket. After the entire input is scanned, we need to just scan the buckets, see the counts associated with each bucket and return the sorted list. This algorithm takes Θ(n) time. 4 Median and Order Statistics If we are given a sorted array of n elements, we can find the ith smallest element (called the ith order statistic) in Θ(1) steps. You can also search whether an element with value k is present in O(log n) using binary search. 8

The algorithm is given below. binarysearch (A, p, q, k) Input: A[p, p + 1,..., q] the sorted array, k is the number to find if k exists in A[p, p + 1,..., q] Output: True if k is present, otherwise false. if p = q then { if A[p] = k then return true; else return false; if A[ (p + q)/2 ] = k then return true; if A[ (p + q)/2 ] > k then return binarysearch (A, p, (p + q)/2 1, k); else return binarysearch (A, (p + q)/2 + 1, k); The worst case time complexity of the above algorithm is given by the recurrence T (n) = T (n/2) + Θ(1), whose solution is T (n) = O(log n). We now define the selection problem as given an array A of n distinct numbers (need not be sorted) and a number i with 1 i n, find the ith smallest element in A. Note that we know how to find the minimum and maximum in Θ(n) time. How do we solve it for any i? We can do it in linear time, the intuition is as follows. Find a number x as the pivot, and partition A into two sets based on x. Now we need to look in one of the two sets. This algorithm takes worst case when the pivot is not a good one the pivot is say the largest element. To ensure that we can do selection in linear time, we can choose a good pivot. Let us see how to choose a good pivot. For simplicity we assume that n is a power of 5!! 1. divide A into n/5 groups, each having 5 elements. 2. find the median of each of the n/5 groups. 3. find the median of these n/5 medians recursively using the selection algorithm. Use this median of medians as the pivot. 4. partition A according to the pivot, and now recursively do the selection problem on the correct partition. Let us examine why the above pivot is a good one. First observe that (n/5 1)/2 medians will be > the median of medians. Similarly (n/5 1)/2 medians will be < the median of medians. 9

This means at least (3n/10 3/2) elements in A are > the median of medians. Similarly at least (3n/10 3/2) elements in A are < the median of medians. This means when we call the selection again, the size of the partition will be atmost 7n/10 + 1/2. We can therefore write the recurrence for the worst case complexity as T (n) = T (n/5) + T (7n/10 + 1/2) + Θ(n). Solving this, we get the time complexity of the above algorithm is Θ(n). 10