Questions... How does one show the first problem is NP-complete? What goes on in a reduction? How hard are NP-complete problems?

Even More NP

Reduction We say that problem A reduces to problem B, if there is an efficient algorithm to solve A using a subroutine to solve B. The subroutine for B can be used at unit cost--- we have an oracle to answer instances of B. A reduces to B is denoted A B. Say that A and B can both be solved efficiently. Does A reduce to B?

3-SAT and Independent Set (x 1 x 4 x 2 ) ( x 3 x 4 x 5 )... Do these problems have anything to do with each other? We will show how to solve 3-SAT efficiently, given an oracle which solves independent set. In other words, 3-SAT Independent Set

Reduction Say that the clauses in our formula are: (x 1 x 4 x 2 ) ( x 3 x 4 x 5 ) ( x 1 x 2 x 5 ) Then we construct the following graph: x 4 x 4 x 5 x 1 x 2 x 3 x 5 x 1 x 2

First Thing to Check The reduction needs to be efficient. Given a formula, can we efficiently construct the desired graph? x 4 x 4 x 5 x 1 x 2 x 3 x 5 x 1 x 2

Reduction Claim: The formula is satisfiable if and only if the graph has an independent set of size the number of clauses. (x 1 x 4 x 2 ) ( x 3 x 4 x 5 ) ( x 1 x 2 x 5 ) x 4 x 4 x 5 x 1 x 2 x 3 x 5 x 1 x 2

Reduction: second part If there is an independent set of size the number of clauses, then the formula is satisfiable. Must involve one vertex per triangle. No inconsistencies. x 4 x 4 x 5 x 1 x 2 x 3 x 5 x 1 x 2

The first NP-complete problem AND We will show that OR AND AND Circuit SAT is NP-complete. AND x 2 OR x 3 x 3 1 x 1 NOT 0 x 2 x 1

Circuits are Universal AND Any polynomial time OR AND AND computation can be done by a polynomial AND 0 1 OR 1 size circuit. 1 NOT 1 0 0 0

Circuit SAT If a problem is in NP, then there is a polynomial time verification procedure, which recognizes a valid solution. We can represent this verification procedure as a circuit! AND OR AND AND The input to the circuit is a potential solution. AND x 2 x 3 OR x 3 With Circuit SAT we can test 1 x 1 NOT 0 x 2 if there exists a solution which x 1 passes the test!

Coping with NP-hardness NP-hardness is a worst case notion. An algorithm is judged by its behavior on the most troublesome input. In practice, sometimes the instances of a problem that arise are much easier to solve or amenable to certain heuristics. For satisfiable random 3-SAT instances with 200 vars, 320 clauses, assignment can be found in ~0.3 seconds.

Approximation Algorithms Approximation algorithms are another approach to NP-hardness that has been very successful. So far we have mostly talked about decision problems: Is a formula satsifiable? Does a graph have an independent set of size k? Is there a salesman tour of length at most w?

Optimization Problems In many of these problems of minimizing or maximizing there is a natural quantity we are after: The size of a largest independent set in a graph. The length of a shortest salesman tour. For other problems like SAT, this is not as clear... We could look at trying to satisfy as many clauses as possible.

Approximating TSP Let s focus on the traveling salesman problem. If finding the optimal tour efficiently is out of reach, can we at least hope to find a tour which is pretty good? A c-approximation algorithm efficiently produces an answer that is guaranteed to be within a factor of c of the optimal opt(x) alg(x) c opt(x) for all inputs x.

Approximating TSP A c-approximation algorithm for the Traveling Salesman Problem would: Output a closed tour visiting each city exactly once. For every input of cities, the tour produced is at most a factor of c longer than the optimal. Run in polynomial time.

Triangle Inequality In our example, the distance between cities was the distance on earth as the bird flies. This distance measure satisfies the triangle inequality. z d(x, z) d(x, y)+d(y, z) x y

Triangle Inequality A TSP problem does not have to satisfy the triangle inequality... Say our frugal salesman is flying and wants to minimize the cost of the trip. Distance is now dollars. The cost of a ticket from Philadelphia direct to Los Angeles can be more than the cost of a ticket from Philadelphia to Newark to Los Angeles.

TSP with triangle inequality TSP remains NP-hard even with this restriction that the distance measure satisfies the triangle inequality. But in this case we can give a 1.5 factor approximation algorithm! This result has not been improved for 35 years. Today we will see how to get a factor of 2 approximation.

Step 1: Compute MST Note: minimum tour (minus an edge) gives spanning tree.

Traversing a Tree Now we have a minimum spanning tree. We can make a closed path in this tree visiting every vertex and using each edge twice.

Step 2: Traverse the Tree 2 3 13 1 5 4 6 7 8 9 12 14 10 15 11 16

Step 3: Take Shortcuts This gives us a tour of distance twice the weight of minimum spanning tree as we take each edge twice. Problem: We visit vertices more than once! Fix: When you are going to visit a vertex for the second time, skip directly to next new node. By triangle inequality, distance will not increase!

Final Tour Distance of Tour=7,032 Optimal Tour=6,859 2 3 13 1 5 4 6 7 8 9 12 14 10 15 11 16

Another Restriction Euclidean TSP: Cities lie in the plane with co-ordinates (x i,y i ). The distance between cities i and j is the familiar expression: d(i, j) = (x i x j ) 2 +(y i y j ) 2 (x j,y j ) This version remains (x i,y i ) y j y i NP-Hard! x j x i

Approximation for Euclidean TSP In 1996, Arora and independently Mitchell showed that for any constant > 0 one can find in polynomial time a tour that has length at most optimal. 1+ times that of the The running time of the algorithm goes like n 1/ Such a result with just the triangle inequality would imply P=NP. Just last week they were awarded the Godel Prize.

Spectrum of Approximation Before you think the news is all good... There are other NP-optimization problems which remain hard even to just beat the trivial approximation. Independent Set: Say that when an n-vertex graph has an independent set of size k you could efficiently produce an independent set of size k n.99. Such an algorithm could be used to solve all of NP!