SAT-CNF Is N P-complete Rod Howell Kansas State University November 9, 2000 The purpose of this paper is to give a detailed presentation of an N P- completeness proof using the definition of N P given by Brassard and Bratley [1] and the following definition of N P-hardness: Definition 1 A decision problem Y is N P-hard if for every X N P, X p m Y. We focus on the following two problems: Satisfiability (SAT): Input: A formula f over boolean variables with operators,, and. Question: Is there an assignment of boolean values to the variables in f such that f is true? Conjunctive Normal Form Satisfiability (SAT-CNF): Input: A formula f in conjunctive normal form (CNF), i.e., of the form m k i, i=1 j=1 where each α ij is either a boolean variable or the negation of a boolean variable. Question: Is there an assignment of boolean values to the variables in f such that f is true? We will show that SAT-CNF is N P-complete (this fact is originally due to Cook [2]). We assume the following theorem, also due to Cook [2]: Theorem 1 SAT is N P-hard. Copyright c 2000, Rod Howell. This paper may be copied or printed in its entirety for use in conjunction with CIS 775, Analysis of Algorithms, at Kansas State University. Otherwise, no portion of this paper may be reproduced in any form or by any electronic or mechanical means without permission in writing from Rod Howell. α ij 1
We first show that SAT N P. Because SAT-CNF is a special case of SAT, it will follow that SAT-CNF N P. Theorem 2 SAT N P. Proof: Our proof space Q will be the set of finite sequences of boolean values. Let Φ be the set of all boolean formulas, and let V be the set of all boolean variables. We define a partial function g : Φ Z + V so that g(f, k) is the variable in f whose first occurrence is kth among the first occurrences of all variables in f; if f contains fewer than k variables, g(f, k) is undefined. We then define F SAT Q so that f, b 1, b 2,..., b m F iff f contains exactly m variables; and the assignment of the boolean value b k to the variable g(f, k) for 1 k m is a satisfying assignment for f. Thus, for each f SAT, there is a q Q no longer than f. We will now show that we can decide whether f, q F in O(n 2 ) time, where n is the sum of the lengths of f and q. Our algorithm proceeds as follows: 1. We first construct a list of all the variables in f, ordered by the position of their first occurrence. As we construct this list, we record the position in this list of each occurrence of each variable in f. This can clearly be done in O(n 2 ) time. 2. We then verify that the length of q is exactly the number of variables in f. This can clearly be done in O(1) time if q is stored in an array. 3. We then verify that f is satisfied by assigning the kth variable in the variable list the kth boolean value in q, for all k. We accomplish this by a straightforward recursive evaluation of f, looking up the value of each variable using its recorded position in the variable list. Assuming the sequence q is stored as an array, this evaluation can be done in O(n) time. Corollary 1 SAT-CNF N P. Corollary 2 SAT is N P-complete. Our proof of N P-hardness will involve a reduction from SAT to SAT-CNF. This reduction will consist of two steps. First, we will convert the given boolean formula to an equivalent formula in which negation is applied only to variables, not to arbitrary subexpressions. We will then construct from the resulting formula a formula in CNF that is satisfiable iff the original formula is satisfiable. The CNF formula will not, in fact, be equivalent to the original formula, because conversion to CNF can result in an exponentially larger formula, and hence uses exponential time in the worst case. We begin by showing how negations can be moved to the variables. 2
Lemma 1 There is a linear-time algorithm to convert arbitrary boolean formulas to equivalent boolean formulas in which negation is applied only to variables. Proof: Our algorithm uses the following laws of boolean algebra: (f g) = f g; (1) (f g) = f g; and (2) f = f. (3) We can apply one of these three laws to any subexpression f, where f is not a variable, to obtain an equivalent subexpression in which all negated subexpressions are strictly shorter than f. A straightforward recursive implementation of this strategy produces an equivalent formula of the proper form in linear time. We now define a literal to be either a variable or the negation of a variable. Using this definition, we can describe our reduction from SAT to SAT-CNF. Theorem 3 SAT-CNF is N P-hard. Proof: We will show that SAT p m SAT-CNF. From Theorem 1, it will follow that SAT-CNF is N P-hard. Let f be an arbitrary boolean formula. Our algorithm first coverts f to an equivalent formula f in which negations are applied only to variables. From Lemma 1, this can be done in linear time. Furthermore, because the conversion can be done in linear time, the length of f is linear in the length of f. For the next step of our algorithm, we assume the existence of a polynomial-time algorithm to generate new unique variables. In particular, after the algorithm is initialized, we can call it arbitrarily many times, and each time it will return a variable different from any variable in f and any variable it had previously returned. It is not hard to design such an algorithm that returns n variables in a time polynomial in n and the length of f. We now describe a recursive algorithm that takes a formula φ in which negations are applied only to variables, and produces a formula φ in CNF that is satisfiable iff φ is satisfiable. Let V be the set of variables in φ and V be the set of variables in φ. Specifically, φ will be satisfiable by an assignment g : V {true, false} iff φ is satisfiable by the assignment g : V {true, false}, where g(v) = g (v) for all v V. The base case occurs when φ is a literal. In this case, the algorithm simply returns φ. Otherwise, there are two cases. Case 1: φ = φ 1 φ 2. We first recursively compute φ 1 and φ 2. We then return φ = φ 1 φ 2. Because φ 1 and φ 2 are both in CNF, φ is in CNF. Suppose φ is satisfied by some assignment g. Then g must satisfy both φ 1 and φ 2. Let V 1 be the set of variables in either φ 1 or V, and let V 2 be the set of variables in either φ 2 or V. Then V = V 1 V 2, and V = V 1 V 2. There must be assignments g 1 : V 1 {true, false} satisfying 3
φ 1 and g 2 : V 2 {true, false} satisfying φ 2 such that for all v V, g(v) = g 1(v) = g 2(v). Let g : V {true, false} be defined as { g g (v) = 1(v) if v V 1 g 2(v) if v V 2. Clearly, g satisfies φ. Now suppose φ is satisfied by some assignment g. Then g must satisfy both φ 1 and φ 2; hence, it also satisfies both φ 1 and φ 2. Case 2: φ = φ 1 φ 2. We first recursively compute φ 1 and φ 2, then generate a new variable x. We then change each conjunct c of φ 1 to x c, and change each conjunct c of φ 2 to x c. We return φ, the conjunction of the resulting two formulas. Again, because both φ 1 and φ 2 are in CNF, φ is in CNF. Suppose φ is satisfied by some assignment g. Then g must satisfy at least one of φ 1 or φ 2. Assume g satisfies φ 1 ; the other case is handled symmetrically. Define V 1 and V 2 as in Case 1. Then V = V 1 V 2 {x} and V = V 1 V 2. There must be an assignment g 1 : V 1 {true, false} satisfying φ 1 such that for all v V, g(v) = g 1(v). Let g : V {true, false} be defined as g 1(v) if v V 1 g (v) = false if v = x true otherwise Because g (x) = false, and because each conjunct of φ 1 is satisfied by g 1, each conjunct of φ is satisfied by g. Now suppose φ is satisfied by some assignment g. Assume g (x) = false; the other case is handled symmetrically. Then each conjunct of φ that was derived from φ 1 must contain a literal that is true under g ; furthermore, each of these true literals must contain variables from V 1. It follows that φ 1 is satisfied by g ; hence, φ 1 and φ are satisfied by g. In order to complete the proof, we must show that the entire algorithm operates in polynomial time. In order to facilitate this, we will first show that the formula φ produced by the recursive algorithm described above contains no more conjuncts than φ has literals, and that each conjunct contains no more literals than the number of s in φ, plus 1. The bound on the number of conjuncts follows immediately from the fact that only the base case introduces new conjuncts, and it introduces one for each literal. The bound on the number of literals in each conjunct follows from the fact that only the base case and Case 2 increase the size of a conjunct. The base case creates a new conjunct (effectively increasing from 0 literals to 1 literal), and Case 2 adds 1 literal to existing conjuncts. Because Case 2 is called once for each in φ, the total number literals in any conjunct in φ is at most the number of s in φ plus 1. 4
The recursive algorithm can be implemented to represent φ as a linked list with head and tail pointers. Each element in the list represents a conjunct. Each conjunct in turn is represented by a linked list of literals. Using this representation, it is easily seen that if we ignore the time needed to generate new variables, the recursive algorithm can be implemented to run in time linear in the size of φ, which is polynomial in the size of f. Because the total number of new variables needed is polynomial in the size of f, they can also be generated in polynomial time. Therefore, the entire algorithm runs in time polynomial in the size of f. Corollary 3 SAT-CNF is N P-complete. References [1] Gilles Brassard and Paul Bratley. Fundamentals of Algorithmics. Prentice Hall, 1996. [2] Steven Cook. The complexity of theorem proving procedures. In Proc. Third Annual ACM Symposium on the Theory of Computing, pages 151 158, 1971. 5