So far we have studied: Comparisons Insertion Sort Merge Sort Worst case Θ(n 2 ) Θ(nlgn) Best case Θ(n) Θ(nlgn) Sorting Revisited So far we talked about two algorithms to sort an array of numbers What is the advantage of merge sort? Answer: O(n lg n) worst-case running time What is the advantage of insertion sort? Sorts in place (only a constant number of elements of the input array are ever stored outside the array) Fast with small input size (so when array nearly sorted, runs fast in practice) Next: Heapsort Combines advantages of both previous algorithms 1
Heaps A heap can be seen as a complete binary tree: 8 7 9 3 2 4 1 What makes a binary tree complete? Is the example above complete? Heaps Tree is completely filled on all levels except possibly the lowest, which is filled from the left up to a point 8 7 9 3 2 4 1 1 1 1 1 1 The book calls them nearly complete binary trees; can think of unfilled slots as null pointers 2
Heaps In practice, heaps are usually implemented as arrays: 1 2 3 4 5 6 7 8 9 10 A = 8 7 9 3 2 4 1 = 2 1 4 5 6 7 8 7 9 3 2 4 1 8 9 10 Array A representing a heap is an object with two attributes: length[a] = the number of elements in the array, and heap-size[a] = the number of elements in the heap stored within array A. Clearly, heap-size[a] length[a] 3 Heaps To represent a complete binary tree as an array: The root node is A[1] Node i is A[i] The parent of node i is A[i/2] (note: integer divide) The left child of node i is A[2i] 1 The right child of node i is A[2i + 1] 1 2 3 4 5 6 7 8 9 10 A = 8 7 9 3 2 4 1 = 2 3 4 5 6 7 8 7 9 3 2 4 1 8 9 10 2 3
Referencing Heap Elements The root of the tree is A[1], and given the index i of a node, the indices of its parent PARENT(i), left child LEFT(i), and right child RIGHT(i) can be computed simply: PARENT(i) { return i/2 } LEFT(i) { return 2*i } RIGHT(i) { return 2*i + 1 } The Heap Property Heaps also satisfy the heap property: A[Parent(i)] A[i] for all nodes i > 1 In other words, the value of a node is at most the value of its parent Largest element in a heap stored at the root Definitions: The height of a node in the tree = the number of edges on the longest downward path to a leaf The height of a tree = the height of its root 4
Heap Height Since a heap of n elements is based on a complete binary tree, its height is Θ(lg n) (in other words a binary tree of height h has at most 2 h leaves) This is nice: turns out that basic heap operations take at most time proportional to the height of the heap Basic Heap Procedures The HEAPIFY procedure, which runs in O(lg n) time, is the key to maintaining the heap property. The BUILD-HEAP procedure, which runs in linear time, produces a heap from an unordered input array. The HEAPSORT procedure, which runs in O(n lg n) time, sorts an array in place 5
Heap Operations: Heapify() Heapify(): maintain the heap property Given: a node i in the heap with children l and r Given: two subtrees rooted at l and r, assumed to be heaps Problem: The subtree rooted at i may violate the heap property Action: let the value of the parent node float down so subtree at i satisfies the heap property Heap Operations: Heapify() HEAPIFY(A, i) 1 l LEFT(i) 2 r RIGHT(i) 3 if l heap-size[a] and A[l] > A[i] 4 then largest l 5 else largest i 6 if r heap-size[a] and A[r] > A[largest] 7 then largest r 8 if largest i 9 then exchange A[i] A[largest] 10 HEAPIFY(A, largest) 6
Heap Operations: Heapify() At each step, the largest of the elements A[i], A[LEFT(i)], and A[RIGHT(i)] is determined, and its index is stored in largest. If A[i] is largest, then the subtree rooted at node i is a heap and the procedure terminates. Otherwise, one of the two children has the largest element, and A[i] is swapped with A[largest], which causes node i and its children to satisfy the heap property. The node indexed by largest, however, now has the original value A[i], and thus the subtree rooted at largest may violate the max-heap property. Consequently, HEAPIFY must be called recursively on that subtree. Heapify() Example 4 10 14 7 9 3 2 8 1 A = 4 10 14 7 9 3 2 8 1 7
Heapify() Example 4 10 14 7 9 3 2 8 1 A = 4 10 14 7 9 3 2 8 1 Heapify() Example 4 10 14 7 9 3 2 8 1 A = 4 10 14 7 9 3 2 8 1 8
Heapify() Example 4 7 9 3 2 8 1 A = 4 7 9 3 2 8 1 Heapify() Example 4 7 9 3 2 8 1 A = 4 7 9 3 2 8 1 9
Heapify() Example 4 7 9 3 2 8 1 A = 4 7 9 3 2 8 1 Heapify() Example 8 7 9 3 2 4 1 A = 8 7 9 3 2 4 1 10
Heapify() Example 8 7 9 3 2 4 1 A = 8 7 9 3 2 4 1 Heapify() Example 8 7 9 3 2 4 1 A = 8 7 9 3 2 4 1 11
Analyzing Heapify() Fixing up relationships between i, l, and r takes Θ(1) time (constant time) If the heap at i has n elements, how many elements can the subtrees at l or r have? Answer: 2n/3 (worst case: last row 1/2 full) So time taken by Heapify() is given by T(n) T(2n/3) + Θ(1) So we have Analyzing Heapify() T(n) T(2n/3) + Θ(1) By case 2 of the Master Theorem, T(n) = O(lg n) Alternatively, we can characterize the running time of HEAPIFY on a node of height h as O(h). 12
Heap Operations: BuildHeap() We can build a heap in a bottom-up manner by running Heapify() on successive subarrays Fact: for array of length n, all elements in range A[ n/2 + 1.. n] are all leaves of the tree and so each is a 1-element heap So: Go backwards through the array from n/2 to 1, calling Heapify() on each node. Order of processing guarantees that the children of node i are heaps when i is processed BUILD-HEAP(A) Produces a heap from an unordered input array 1 heap-size[a] length[a] 2 for i length[a]/2 downto 1 3 do HEAPIFY(A, i) 13
BuildHeap() Example Work through example A = {4, 1, 3, 2,, 9, 10, 14, 8, 7} 4 1 3 2 9 10 14 8 7 14
Build-Heap:Worked Example The operation of BUILD-HEAP, showing the data structure before the call to HEAPIFY in line 3 of BUILD-HEAP. (a)a 10-element input array A and the binary tree it represents. The figure shows that the loop index i refers to node 5 before the call HEAPIFY(A, i). (b) The data structure that results. The loop index i for the next iteration refers to node 4. (c)-(e) Subsequent iterations of the for loop in BUILD-HEAP. Observe that whenever HEAPIFY is called on a node, the two subtrees of that node are both heaps. (f) The heap after BUILD-HEAP finishes. Analyzing BuildHeap() Each call to Heapify() takes O(lg n) time There are O(n) such calls (specifically, n/2 ) Thus the running time is O(n lg n) Is this a correct asymptotic upper bound? Is this an asymptotically tight bound? A tighter bound is O(n) How can we prove this? 15
Analyzing BuildHeap(): Tight To Heapify() a subtree takes O(h) time where h is the height of the subtree h = O(lg m), m = # nodes in subtree The height of most subtrees is small Fact: an n-element heap has at most n/2 h+1 nodes of height h CLR 7.3 uses this fact to prove that BuildHeap() takes O(n) time Heapsort Given BuildHeap(), an in-place sorting algorithm is easily constructed: Maximum element is at A[1] Discard by swapping with element at A[n] Decrement heap_size[a] A[n] now contains correct value Restore heap property at A[1] by calling Heapify() Repeat, always swapping A[1] for A[heap_size(A)]
Heapsort HEAPSORT(A) 1 BUILD-HEAP(A) 2 for i length[a] downto 2 3 do exchange A[1] A[i] 4 heap-size[a] heap-size[a] - 1 5 HEAPIFY(A, 1) Example: Heapsort 17
Example: Heapsort The operation of HEAPSORT: (a) The max-heap data structure just after it has been built by BUILD-HEAP. (b)-(j) The max-heap just after each call of HEAPIFY in line 5. The value of i at that time is shown. Only lightly shaded nodes remain in the heap. (k) The resulting sorted array A. Analyzing Heapsort The call to BuildHeap() takes O(n) time Each of the n - 1 calls to Heapify() takes O(lg n) time Thus the total time taken by HeapSort() = O(n) + (n - 1) O(lg n) = O(n) + O(n lg n) = O(n lg n) 18
Application: Priority Queues The heap data structure itself has enormous utility. One of the most popular applications of a heap: its use as an efficient priority queue. A priority queue is a data structure for maintaining a set S of elements, each with an associated value called a key. One application of max-priority queues is to schedule jobs on a shared computer. The priority queue keeps track of the jobs to be performed and their relative priorities. 19