CS800 : Algorithms Fall 201 Nov 22, 201 Quiz 2 Practice Total Points: 0. Duration: 1hr 1. (,10) points Binary Heap. (a) The following is a sequence of elements presented to you (in order from left to right): 10,, 8, 7,, 1, 4. Show the result of forming a binary min heap by adding each new element to the end and heapifying up. (b) Give an O(n) algorithm for building a binary min heap from a sequence of n unsorted elements. You must include a proof of correctness and of the run time. Solution: (a) First heap from inserting 10: 10 Insert and heapify up: 10 Insert 8 and heapify up: 10 8 Insert 7 and heapify up: 7 8 10 Insert and heapify up: 1
8 10 7 Insert 1 and heapify up: 1 10 7 8 Insert 4 and heapify up: 1 4 10 7 8 Note that the heap is actually an array, and the binary tree is an interpretation of it. As we add elements, we fill out the implicit binary tree one level at a time, from left to right. (b) Take the unsorted array and interpret it as a heap, which does not yet satisfy the heap property. We will fix each sub tree one at a time, starting with the leaves, until the entire binary tree satisfies the heap property. We do this by calling HeapifyDown(n), HeapifyDown(n 1),..., HeapifyDown(2), HeapifyDown(1), where HeapifyDown(i) means to call heapify on the i th element of the array. A leaf sub tree already satisfies the heap property. When we call HeapifyDown(i), and the i th element is not at a leaf node, we have already previously called Heapif ydown(2i) and Heapif ydown(2i+ 1). Thus these two sub trees will satisfy the heap property. Thus inductively, the sub tree rooted at node i will satisfy the heap property. At the end, the entire binary tree will satisfy the heap property. We must finally show that the run time is O(n). Note that we know always that a heapify operation will take time O(log n) in 2
the worst case. Since we are doing n of these, our run time will be no worse than O(n log n). However, we are going to show that our series of heapify operations will take time only O(n). This is because the performance of heapify is dependent on how far up / down the tree the item has to sift. For many of the items, the depth of the sub tree is small enough that the heapify operations will take time O(1). We formalize this as follows: Let h be the height of an item, starting from the leaf nodes, which we will say have height 0. Then a heapify down operation on such a node will take at most O(h) operations. The number of nodes at height h is at most n/2 h+1. Thus the total run time of our algorithm is: log n h n/2 h+1 h=0 log n n/2 h/2 h h=0 n/2(1/2 + 2/4 + 3/8 + ) = O(n) Where we used the fact that h=0 h/2h converges to a constant. 2. (3,12) points Given a sequence of distinct numbers a 1, a 2,..., a n an increasing subsequence is a (not necessarily contiguous) subsequence i 1, i 2,..., i k such that a ij < a ij+1. (a) Give an example of a sequence of 4 distinct numbers such that neither the sequence nor the reverse of the sequence has an increasing subsequence of length 3. (b) Give an O(n 2 ) algorithm for finding the longest increasing subsequence in a given array of n distinct numbers. Give a proof of correctness and run time analysis for your algorithm. Solution: 3
(a) Consider the sequence [10, 11, 1, 2]. There is no increasing subsequence of length 3, It s reverse, [2, 1, 11, 10], also satisfies this. (b) Let L(j) be the length of the longest increasing subsequence ending at a j. Either this sequence is just a j, with length 1, or it includes some other list of elements S. If a i is the last element of S, then S is a longest increasing subsequence ending at a i, with length L(i). We dont know which value of i is best, so we take the max over all. This gives us the recursion: L(j) = 1 + max{l(i) i < j and a i < a j } The longest increasing subsequence of the whole sequence is the max over all the L(j) values, since we dont know where this sequence should end. Our recursion requires no base cases. Thus our algorithm is: L(1) 1 for j = 2, 3, n do L(j) = 1 + max{l(i) i < j and a i < a j } end for return max{l(j) 1 j n} Computing L(j) takes time O(j), since it is a max operation over the values L(1), L(2), L(j 1), not necessarily all of them but in the worst case all of them. Thus the entirety of the for loop takes time (1 + 2 + (n 1) = O(n 2 ). The final max operation is a max over L(1), L(2), L(n) and takes time O(n). Thus the final run time is O(n 2 ). 3. (9, ) points Bellman-Ford and Floyd-Warshall. (a) The transitive closure of a directed graph G is a graph on the same node set with an arc (u, v) exactly when there is a directed path from u to v in G. Give an O( V 3 ) algorithm for finding the transitive closure of a given directed graph. Give proof of correctness and runtime analysis. (b) Give an algorithm that is faster than O( V 3 ) for finding all-pairs shortest paths in a sparse (with E = V log V ) directed graph with positive edge weights. (Hint: Recall that Dijkstra has a running time of O( E log V ).) 4
Solution: (a) Create a weighted version of the graph, where each edge has weight 0. Then for a pair of nodes (u, v), the shortest path from u to v is 0 if and only if there is a path from u to v, and otherwise. Thus we will do the following: perform all pairs shortest paths on our weighted graph, which takes time O(V 3 ). Then create a new unweighted version of the graph. For every pair of nodes (u, v), add an edge from u to v if and only if the shortest path from u to v is 0. The final resulting graph will be the transitive closure of G. It is O( V + E ) to construct the weighted graph, O( V 3 ) to compute the all pairs shortest paths, and O( V 2 ) to compute the final transitive closure. Thus the total run time is O( V 3 ) (b) For every node u, compute the single source shortest paths from u using Dijkstra s. The run time of such a computation will be O( E log V ) = O( V log V 2 ). If we do it for every node, the total run time will be O( V 2 log V 2 ). 4. (8,7) points Median and Quicksort. (a) Give pseudo-code for a space-efficient implementation of quicksort with random pivot selection. You are not charged for the input array of n numbers which you are allowed to modify. How much additional space does your implementation use? (b) Give the recurrence for the linear-time median finding algorithm assuming we use samples of size. Solve the recurrence. Solution: (a) Our goal is O(log n) additional space, by sorting our array in place. To do this, we treat the array A as a global variable. Our sub routine QuickSort will take as input indices i and j, and will sort A[i : j] in place. To pivot around a particular value p in place, we first swap p to the end of the array. Then we do a linear sweep of the array, swapping elements that are smaller than p to the first half of the array. We keep track of how many such elements we have swapped, in order to keep
track of the size of the first sub array we will recurse on, as well as avoid unswapping elements we had previously moved there. QuickSort(i, j) if i j then return end if p [i j], randomly sampled pivot = A[p] Swap A[j] and A[p] count = 0 for k = i, j 1 do if A[k] pivot then Swap A[k] and A[i + count] count + = 1 end if end for swap A[count] with A[j] QuickSort(i,count - 1) QuickSort(count+1,j) To see that this is O(log n), there are two parts to analyze. First, we are initializing a counter to keep track of the size of the first and second sub arrays. Since on average these sizes will be n/2, we need a counter of log n bits to keep track of it. Second, through recursive calls, we are using extra space to keep track of the stack to return to after a call has finished (if we only ended in one recursive call, we could use optimized tail recursion to get around this, but we end with two). The average depth of the divide and conquer tree, with randomly selected pivots, will be log n. (b) Formally, we are given an array A of n elements, and want to find the element of rank n/2. For each group of elements, finding the median of that group takes O(1) time. There are n/ such groups, so doing this takes O(n) time. Then, to find the median of the medians, we can recursively apply our algorithm to the array of n/ medians. Once we have found the median of medians, we know that this is approximately the median of the entire set. It is greater than 3/10ths of the elements, and less than 3/10ths of the
elements. We can partition around this element, and recurse on the sub array with more elements, as it must contain the original median. We appropriately alter our desired rank to what rank the median must have in this set. Thus our algorithm has two recursive calls: one on the set of medians of size n/, and one on the appropriate sub array, which has size no larger than 7n/10 (since both sub arrays have size at least 3n/10). This yields the recurrence relation: T (n) = T (n/) + T (7n/10) + αn To prove that T (n) = O(n), we assume that T (n) cn for some constant c satisfying α 0.1c. This is true for n = 1, since the algorithm trivially needs no operations. Suppose this is true for smaller n. Then: T (n) = T (n/) + T (7n/10) + αn c(n/) + c(7n/10) + αn = c(n/ + 7n/10) + αn = 0.9cn + αn 0.9cn + 0.1cn = cn Thus inductively T (n) cn = O(n). 7