Name: Lirong TAN 1. (15 pts) (a) Define what is a shortest s-t path in a weighted, connected graph G.

1. (15 pts) (a) Define what is a shortest s-t path in a weighted, connected graph G. A shortest s-t path is a path from vertex to vertex, whose sum of edge weights is minimized. (b) Give the pseudocode for Dijkstra's algorithm for solving the shortest s-t path. Analyze its time complexity. Implemented with Fibonacci heap. Maintain a set ( ) of explored nodes, for which the shortest path has been determined, and a set ( ) of unexplored nodes. Initialize =, =, =0, =, for all other vertex Insert all vertices into While (IsEmpty( )=False) do =ExtractMin( ) add to for in.neighbors() changekey( ) return Therefore, the total time is: +

(c) Prove that Dijkstra's algorithm is correct. Claim: =min, : is the shortest path from to We prove this by induction on. 1, is obviously the shortest path from to, if is selected to be added to next. 1, let be the next node to be added to. And we need to prove is still the shortest path from to. There are two possibilities for the path from to : 1, the path only contains nodes in S. In this situation, is the shortest path since we update every time we add a new node to 2, the path contains nodes in both and, as shown in the figure below. Suppose there is a node on path. Let be the first edge in that leaves, and let be the subpath to. Due to the nonnegative property of weights, we ignore the weights from to, thus,

Due to the hypothesis that is the shortest path from to for all in, +, From the definition of, we have +, +, is the next node to be added to, so +, +, To sum up, any path from to will be at least as expensive as 2. (15 pts) For the following questions you may assume that all edge weights are unique. (a) Define what is a minimum cost spanning tree of a weighted, connected graph G. =,. A minimum cost spanning tree is a subset of edges such that is a spanning tree whose sum of edge weights is minimized. (b) Give the pseudocode for Kruskal's Algorithm. Analyze its time complexity. (Implemented with union-find data structure) =(, ), Kruskal(G,C) =, = % is the set of weight costs Sort edge weights so that ( ) ( ) ( ) = for each ( ) make a set containing singleton for =1 to (, )= (1) ( ) ( ) ( )

If ( and are in different sets) the sets containing and return T 1) Time complexity: For union-find data structure: requires (1) time, the label of a node also takes (1) time For ( 1, 2) operation, each node undergoes at most logn label changes. Since, we have: ( ) ( ) Therefore, the total time complexity is: (c) Prove that Kruskal's algorithm is correct. The correctness of Kruskal s algorithm is obvious according to the two properties of MST. Cut property. Let be any subset of nodes, and let be the cost edge with exactly one endpoint in. Then the MST contains. Cycle property. Let be any cycle, and let be the cost edge belonging to. Then the MST does not contain. Kruskal s algorithm considers edges in ascending order of weight. For each edge, ), if and are in the same component, adding to creates a cycle, thus we discard according to cycle property; otherwise, insert into according to cut property. Since we consider edges in ascending order of weight, the sum of edge weights is minimized.

3. (15 pts) Give a linear time O(n+m) algorithm to test whether or not a particular edge e (of a graph G with n nodes and m edges) belongs to a minimum spanning tree. Idea: My algorithm is based on the cycle property of MST. Claim: If edge has the max cost in a cycle that includes it, edge does not belong to a minimum spanning tree. Otherwise, it belongs to a MST. Proof: The first part of the claim in obvious according to the cycle property. We prove the second part by contradiction. Hypothesis: Edge is not the most expensive edge in any cycle that includes it, but it does not belong to a MST. If we add edge to the MST, there will be a cycle. Since edge is not the most expensive one, we can replace the most expensive one with edge. Thus, the sum of edge weights for the new MST will be lower than the original one, which contradicts with the definition of MST. Thus, we refuse the hypothesis. Algorithm: For implementation, we first traverse all the edges once and delete the edges whose weights are larger than edge. And we suppose that all edge weights are unique. This step takes ( ) time. For the new graph, all the edges are cheaper than edge. If there exists a cycle including edge in the new graph, edge does not belong to a MST. Otherwise, it does. Then, we traverse all the nodes once. Suppose the nodes linked by edge are and. We start from node and find all the nodes reachable from node. Each time we visit a node, we determine whether this node is. If it is, a cycle detected. Furthermore, we mark the node as visited for the first time we visit it, which will avoid duplicated visiting to the same node. This step takes ( ) time. Therefore, the total time complexity is ( + )

=(, ),, ) belongto(, ) For each If ( > ) delete from =,.()= While ( ) do =. () for each vertex adjacent to If (.()= ) continue else If ( = ) print edge does not belong to a MST break else.()= Add to

4. (15 pts)consider the Huffman Coding algorithm (a) Trace the action of the algorithm for the frequency table [(a,10); (b,7); (c, 3); (d, 5); (e,9); (f,2); (g, 3); (h, 2)] Sort the symbols according to frequency: [(f,2);(h,2);(c,3);(g,3);(d,5);(b,7);(e,9);(a,10)] (b) Using the code from above, encode the word 'decade'. 'decade'= 0100010001101000

(c) True or False. Every 2-tree (i.e., binary tree where each internal node has 2 children) is a Huffman tree for some set of frequencies. Justify your answer. True. For an arbitrary 2-tree, whose depth is D. Suppose there are n leaf nodes in this 2- tree. For node, the depth value is, the frequency is. Let =2. From the property of frequency, we have =1. Thus, we can obtain the value of, and consequently the frequency for each leaf node. The frequency value for an internal node is the sum of its two children. This set of frequencies (,,, ) makes the original 2-tree a Huffman tree. 5. (15 pts) Recall the "Selecting Breakpoints" problem from slide-25 in Lecture 3 notes. Suppose instead of optimizing the number of stops, we wanted to minimize the total cost in fuel for the entire trip. Assume that there is a parameter C representing the fuel capacity of the car. Also, at each input breakpoint b, we know the price per gallon p b of fuel at that stop. (a) Design and verify a greedy algorithm to find the minimum cost trip. Idea: For each stop, add fuel to the amount of, which is the exact amount of fuel that is needed to reach the next stop whose price is lower than the current stop. Due to the limitation of fuel capacity, we set = if it requires more than to reach the next lower price stop. Algorithm: stops:,,, prices:,,, the amount of fuel added at each stop:,,, the amount of fuel remained at each stop:,,, the amount of fuel needed from any two stops 1, =( +1)

for ( =1;1; ) %for each stop determine the amount of fuel to be added % is the amount of fuel needed from stop (i-1) to i % 0, 0 for (1;1; ) if ( ) % find the next lower-price stop If ( ) % is the amount of fuel else needed from stop to stop break If no lower-price stop found, Add fuel with the amount of, Verification: Claim: greedy algorithm is optimal. We prove it by contradiction

Suppose the greedy algorithm and the optimal solution are the same for the largest value of. Greedy algorithm selects the next stop and OPT selects stop. Stop can only be prior to stop, since at stop we add oil to, which is the exact amount that is required from stop to stop and we can not go any farther than stop. Thus, >. Let the amount of fuel added at stop is. Why do not we add gallon at instead of at stop? This change to the OPT will achieve a lower cost than OPT, which contradicts with the definition of OPT. This process continues. And we will find that greedy algorithm is optimal. (b) Analyze the complexity of your algorithm. First, we need to know the amount of fuel needed from stop to stop, which takes ( ) time. Furthermore, there is a nested loop. For the worst case, the time would be: = ( ) Thus, the total time complexity is: ( ). 6. (15 pts) Suppose you are given the closing prices of ABC stock over the past n days, and you want to compute the maximum difference in prices over any pair of days such that if you bought low and sold high you could maximize your profit. In other words, you want the maximum difference in which the smaller price precedes the larger. Clearly, this can be done by considering all the differences over each of the O(n2) (or, n choose 2) pairs of days. You are to design a divide-and-conquer algorithm that runs in time O(n log n) for this problem. Hint: consider using the solution to the closest-pair problem as a model. Algorithm: Recursively divide the original problem into two halves until there is only one element in each sub-problem, which can be solved in constant time.

For merging part, we find the lowest price ( ) in the first part, and highest price ( ) in the second part. Compare = with the largest profit for each part ( ), and the profit for the merged problem=max,, ) Original problem: =,, ) 1: 2: : : Divide: 2, = 2 1: : : : Time Complexity: =max(,, ) min(,, ) Let the time complexity of the original problem is With the divide-and-conquer technique, we split the original problem into two halves, and solve each half in time 2 or 2. To merge the two halves, we only need to find the lowest price in the first half and the highest price in the second half. To find the smallest or largest element in a list takes, where is the size of the list. Thus, merging two halves takes time. Thus, we have 2 2 ( )=2 2 + ( )=2 2 4 + /2 + =4 4 +2 ( )= + log Since 1)= (1), the total time complexity ( )= ( )

7. (10 pts) This final question asks you to write a reasoned paragraph giving your feedback of the class so far. It is open ended, so you can share praise or criticism. You can suggest refinements to the lectures or the lecture notes. You can share your frustrations, or share any insights you have gained long the way. Overall, excellent course and excellent instructor! In this course, Professor Annexstein has introduced the classic algorithms in various domains, and techniques for designing efficient algorithms. He is passionate, and his lecture is clear and easy to follow. I have learnt a lot from this course. In addition, I also have a couple of suggestions for this course. First of all, I would prefer to split one class into two shorter classes. It is hard to keep concentrated for such a long time. Too many materials in one class also make the digestion after class more difficult. Therefore, I strongly recommend two classes per week instead of one. Second, I d better the instructor to talk a little bit about data structures, like B-tree, binary heap and so on. Data structures are the basis of the implementation of algorithms and I think they should be briefly covered in this course. Another minor point is that I think the assignment questions are not conveyed clearly enough sometime. Take the second assignment for example. We are not clear about whether the subsequence is consecutive or not. An explanation of that would be preferred.