Algorithm Design Techniques Assignment 5: Solutions () Port Authority. [This problem is more commonly called the Bin Packing Problem.] (a) Suppose K = 3 and (w, w, w 3, w 4 ) = (,,, ). The optimal solution clearly uses two trucks, but this greedy algorithm uses three trucks. (b) Suppose the greedy algorithm uses t trucks where truck i carries a total of W i K units of weight. Now for any i < j t we must have that W i + W j > K otherwise the greedy algorithm would not have used truck j at all. In particular, there can be at most one truck, say i, with W i K. All the other trucks carry weight at east K. Renumber the trucks so that i =, then W + W + W 3 + + W t = (W + W ) + W 3 + + W t K + W 3 + + W t K + (t ) K = tk But this means that the optimal solution must use at least t trucks. So the greedy algorithm is a -approximation algorithm. () 3-D Matching. There is a simple greedy 3-approximation algorithm. Start with M = and examine the triples in any order. Add a triple (x, y, z) to M if it does not intersect with any triple already in M, that is, none of the vertices x, y or z are currently in any triple in M. Repeat until all triples have been examined. Assume the optimal collection M has size k. Then M k. Suppose 3 not. Then the triples in M contain less than 3 k = k vertices. These vertices 3 therefore can intersect with at most k of the triples in M. So there is at least one more triple that the greedy algorithm would have added to M, a contraction. So this is a 3-approximation algorithm.
(3) Vertex Cover. As mentioned in the hint, it is enough to prove that G has a matching of cardinality at least W. Indeed, it is obvious that W is a vertex cover. Now since the size of a vertex cover in a graph G is at least the size of any matching in the graph, we get that if there is a matching M in graph G of size at least W, then W M OP T C, where OP T C is the size of optimal vertex cover. So let us show that the hint holds. We claim that every non-leaf node can be matched in the tree. Indeed, take the root and match it with any of its neighbours and remove these two nodes. Now we are left with a forest. And we can continue in a similar manner for each of the trees left; we can always match the root to one of its children if the tree has at least two vertices, and if we get a tree with only one node then it is a leaf in the DFS tree. This completes the proof. (4) Travelling Salesman Problem - Greedy Fit. (a) We want to show that the cost of the minimum cost tour H is at least the cost of the minimum spanning tree T. Now H is a cycle with n edges. So if we delete any of its edges, say e, then we obtain a Hamiltonian Path P = H {e}. By a path contains no cycles, and a Hamiltonian path uses every vertex. Thus a a Hamiltonian path is a spanning tree. But then P must have cost at least that of the minimum spanning tree T. Thus c(t ) c(p ) = c(h e) C(H) as desired. (b) To analyse our algorithm, consider the Minimum Spanning tree computed using Prim s Algorithm. The algorithm starts initially by adding some arbitrary vertex v to V and and builds a tree, T whose edge set E(T ) = initially. In each iteration it finds a minimum cost edge (i, j) where i V (T ) and j / V (T ) and adds it to E(T ). The algorithm terminates when all the vertices have been added. Observe that the Greedy Fit algorithm adds nodes in exactly the same order are Prim s algorithm (and each choice of node is made because it corresponds to the same edge added by Prim). Of course Greedy Fit creates a cycle out of these nodes rather than a tree. Let s see how well the algorithm does when the TSP problem has metric costs, that is, for any three vertices u, v, w V we have the triangle inequality c uw c uv + c vw.
3 Now consider the working of the algorithm. Suppose we add v i in step i. So this was because Prim s algorithm selected the edge e i = (v i, v j ) for some j < i. The Greedy Fit algorithm then fitted v i in between v j and v j+(modi ). Therefore the cost of the tour T i is larger than the cost of T i by exactly c j,i + c i,j+ c j,j+ c j,i + (c i,j + c j,j+ ) c j,j+ = c j,i = c ei Here the inequality follows from the triangle inequality. Thus the cost of the final tour T n is at most n c ei = c(t ) OP T i= Thus we have a -approximation algorithm. (5) Travelling Salesman Problem - Greedy Walk. (a) Consider a minimum cost tour on n vertices in a graph G = (V, E) where n is even. This tour has n edges. Now color the edges red and blue in an alternating way as you traverse through the tour. (For example, the odd edges are colored red and the even edges are colored blue.) This clearly partitions the edges into two sets, red and blue sets and each of these colors induces a perfect matching in G. We know that the total cost of these two edge sets is OP T. So the cheaper of these two sets have cost at most OP T/. The minimum cost perfect matching cannot cost any more than this. So the minimum cost perfect matching in G gives a lower bound on the minimum cost tour (in fact, twice the minimum cost perfect matching is a lower bound). (b) Let M = {e, e,.., e n } be the minimum cost perfect matching in G. Now label the vertices, such that for each e i M we have (v i, v i+ ) M. Now consider our Greedy Walk algorithm. It either reaches v i before it reaches v i+ or it either reaches v i+ before it reaches v i. Without loss of generality, assume we have the former case. Then when the algorithm selects the departing edge out of v i it has the option of choosing e i because it has not visited v i+ yet. Because the greedy algorithm picks the minimum cost edge between v i and an unvisited vertex, if it does not select e i then it selects an even cheaper edge.
4 (c) From b), the departing edge from one of v i or v i+ has cost at most c ei. Call the vertex in {v i, v i+ } with cheapest departing edge y i and the one with the more expensive departing edge z i. From b), the cost, λ i say, of the edge out of y i is at most c ei. Thus from a) we have n/ n/ λ i c ei OP T i= i= Thus, we have found a set of half of the edges in H that have cost at most OP T. The other half of the edges may be a lot more expensive, but we can use a similar argument to find a set of half of the remaining edges (i.e. a quarter of all edges) that have cost at most OP T, then find a set of half of the remaining edges (i.e. one eighth of all edges) that have cost at most OP T, etc. As there are n edges in H we iterate at most log n times and thus we get an O(log n) approximation algorithm. To prove this, remove the vertices y,..., y n and just consider the set of vertices Z = {z,..., z n}. Find a minimum cost perfect matching on the set Z (you may assume that n is a power of two to simplify the analysis). Again, by the triangle inequality, the optimal tour H on Z has cost at most the cost of H. This tour can be partitioned into two perfect matchings so one of these matchings, say M, has cost at most OP T. Take any edge e = (v i, v i+ ) in M. Whichever of these vertices is visited first by H has an outgoing edge of cost at mosr c e. Thus, from the argument above half of the vertices in M have outgoing edge of cost at most OP T, as desired. The result follows by recursion. (6) Facility Location. For each location i and each subset U V we will have a set S i,u of cost f i + j U d(i, j). We can view S i,u as covering the the elements in U. Thus we have a set cover problem. Recall that the greedy algorithm seen in class (at each step choose the set that covers currently uncovered items at the lowest average cost) gives an O(log n) approximation guarantee where V = n. There is one problem with this though. The number of sets is exponential as there are an exponential number of subsets U V. So the running time of the greedy approximation algorithm will be exponential if we examine all the sets in order to find the one with the lowest average coverage cost. However, at each step there is a quicker way to find the set with the lowest average cost. At time t, let the uncovered elements be v, v,... v nt.
5 Assume there are ordered in increasing distance from vertex i. Then the only sets S i,u that could possibly be of lowest average coverage cost are {v }, {v, v },... {v, v,, v k }... {v, v,, v nt }. To see this consider S i,u for some subset U where v a / U, v b U and a < b. But then S i,u a b covers the same number of uncovered elements as S i,u but at a lower average coverage cost because d(i, a) < d(i, b). There are at most n t n such sets centred at i. This is also true of any other vertex j. (except that the ordering of the vertices will differ because the distances to j differ from the distances to i), so there are only n sets we need to consdider to find the one with lowest average coverage cost. Thus we can implement each algorithm in polynomial time by examining only these sets.