CS227: Assignment 1 Report - PDF Free Download

1 CS227: Assignment 1 Report Lei Huang and Lawson Wong April 20, 2008 1 Introduction Propositional satisfiability (SAT) problems have been of great historical and practical significance in AI. Despite being fundamental problems, SAT problems are NP-hard and are difficult to solve using complete search, such as by the Davis-Putnam procedure [1]. Selman, Levesque, and Mitchell [2] proposed a greedy local search procedure GSAT that was able to quickly solve many hard problems that would take very long using complete methods. Many variants of GSAT have since appeared. We examine the relative performance of GSAT with three of these variants: HSAT [3], GSAT with random walk [4], and WalkSAT [4]. 2 Algorithms The simple GSAT procedure is reproduced in Algorithm 1. Algorithm 1 GSAT [2] 1: while run-time < MAX-TIME (or num-tries < MAX-TRIES) do 2: T := random initialization of variables in Σ, the set of clauses in CNF 3: for i := 1 to MAX-FLIPS do 4: if T satisfies Σ then 5: return T 6: else 7: for each variable v do 8: score[v] := change in # of satisfied clauses in Σ with v s value flipped in T 9: end for 10: poss-flips := {v score[v] = max w score[w]} 11: v := choose a variable at random from poss-flips 12: T := T with v s value flipped 13: end if 14: end for 15: end while 16: return No satisfying assignment found

2 GSAT randomly chooses a truth assignment for variables, then flips individual variable truth values until a satisfying assignment is found, or until timeout (i.e., no solution found). The choice of which variable to flip is made by a hill-climb step, where each variable is given a score, and from the set of variables that have maximum score, one is selected at random. The score of a variable is defined as the change in number of satisfied clauses if the variable s truth value was flipped. Hence the more positive the variable s score, the greater the number of satisfied clauses will result from flipping the variable s truth value. This hill-climb heuristic is therefore greedy as it tries to maximize the next step s total number of satisfied clauses. To study more closely the hill-climbing procedure, Gent and Walsh [3] proposed HSAT, which differs from GSAT only in the way that the next variable to flip is chosen from possflips. Instead of choosing this variable randomly, a variable that was flipped most long ago (if ever) is chosen. In detail, we keep track of an extra array when, indexed by variables, where when[v] is initialized to be 0 for all variables v, and on step (flip) i, if v is chosen, then when[v] := i. Hence this array stores the last iteration that each variable has been flipped, or 0 if it has never been flipped. The variable in poss-flips that has minimum value in when is chosen (ties broken arbitrarily). HSAT therefore provides memory to the procedure. One major problem of applying local greedy search methods such as GSAT and HSAT in a non-convex problem space is that searches easily become stuck at local optima, which in terms of SAT means an assignment satisfying all clauses can never be found. To address this problem, Selman, Kautz, and Cohen [4] proposed the use of random walk to add randomness to the search and allow escaping from local optima. GSAT with random walk does this by simply choosing not to use hill-climbing with probability p = 0.5; instead, it picks an unsatisfied clause and sets poss-f lips to contain all variables occurring in this clause. In other cases (with probability 1 p), standard GSAT hill-climbing is used to find poss-f lips. The authors also proposed a more extreme version, WalkSAT, which differs from GSAT more than the previous variants. In WalkSAT, an unsatisfied clause is always picked; in this way, random walk is always used. With probability p = 0.5, greediness is added by setting poss-flips to be the set of variables in the clause that have the minimum break count, where break count is defined as the change in number of unsatisfied clauses if the variable s truth value was flipped. If a clause changes from satisfied to unsatisfied, then # of unsatisfied clauses increases by 1, # of satisfied clauses decreases by 1, and vice versa for unsatisfied to satisfied; hence, (change in # of unsatisfied clauses) = (change in # of satisfied clauses). Since we want to minimize change of unsatisfied, we can equivalently maximize change of satisfied, hence the greedy procedure of minimizing the break count is actually equivalent to the original procedure of maximizing the score. Hence the score computation as shown in Algorithm 1 is still used in WalkSAT. With probability p, WalkSAT therefore sets poss-f lips to contain the variables in the random unsatisfied clause that have maximum score; in other cases, no greediness is used and all variables are added to poss-flips. 2.1 Optimizations In each iteration of the inner loop, there are three steps that take more than O(1) time: checking whether T is a satisfying assignment, computing scores for each variable, and determining poss-flips. Checking T is an O(KL) task, where K = maximum number of literals per clause, L = number of clauses. To compute a score for a variable, each clause

3 must be checked to see if it is satisfied, which would potentially take O(K) time (e.g., if all literals are false). Since this must be done for each clause and variable, score computation takes O(KLN) time, where N = number of variables. Finally, determining the poss-f lips set takes O(N) time, as each variable s score must be compared. Clearly, the most expensive step is score computation. Initially, when scores are unknown, each literal for each clause for each variable flip must be considered, hence an O(KLN) computation is necessary. However, between iterations, much of this is wasted work, as each iteration only differs by one variable flip. This is the idea behind score caching; given the current scores and the number of true literals for each clause before and after a variable flip, we can use specific rules to determine the new scores after the flip. The number of true literals in a clause is an indicator of whether the clause is satisfied (> 0) or not (= 0). If the number of true literals in clause c, by flipping variable v s truth value, changes from: 0 1: All literals were false, so c gave each clause variable a score of +1. Now, c is satisfied because of v; if v is flipped again, c is unsatisfied, so v s score from c is now 1. Flipping the other literals still makes c satisfied, hence all other clause variables now have score 0. Hence the change in score is 2 for v, 1 for other clause variables. 1 0: Flipping v caused c to become unsatisfied, so v had a score of 1. Since c was satisfied, flipping other clause variables had no effect, so their score was 0. Now, flipping any clause variable makes c true, so each has score 1. Hence the change in score is +2 for v, +1 for other clause variables. 1 2: Flipping v increased the number of true literals, so v s literal must have been false. Then there is another variable w whose literal in c was true. Since this was the only true literal, its score was 1, as flipping it would have made c unsatisfied. Since c was satisfied, flipping clause variables other than w had no effect, so their score was 0. Now, with 2 true literals, flipping any of these two still makes c satisfied by the other true literal, so their scores are 0. Flipping other clause variables has no effect, so their score is 0. Hence the change in score is +1 for w, the original sole true literal. 2 1: Flipping v decreased the number of true literals, so v s literal must have been true. Then there is another variable w whose literal was true. Since there are 2 true literals, falsifying either does not take away c s support, so c remains satisfied and their scores were 0. Since c was true, scores for all other variables was 0. Now, w becomes the sole support of c, so its score is now 1. Since c is still true, all other clause variables have score 0. Hence the change in score is 1 for w, the new sole true literal. Otherwise: Either the number of true literals did not change, or was from 2 2. In the former case, v was not involved in c, so c s contribution to scores remains the same. In the latter case, we saw above that when a clause has 2 true literals, it remains satisfied on any flip as it does not lose its support, hence the the clause s contribution to its clause variables scores is 0. The score of all clause variables from c is therefore 0 both before and now, hence there is no change in score. Given the original scores (for each variable) and number of true literals (for each clause), we can use the above rules to more efficiently compute the new scores and number of true

4 literals. First, use O(KL) time to compute the new number of true literals for each clause. Then, for each clause, determine the correct rule from the old and new number of true literals, and apply the rule on each of its literals to get each clause variable s new score from its old score. This also uses O(KL) time, so overall score caching and updating uses only O(KL) time. Note that the full O(KLN) score computation must still be performed on the initial truth assignment, but on inner loop iterations only O(KL) updating is necessary (and should be performed at the end of the loop to avoid initial redundancy). In the random walk variants of GSAT, the current set of unsatisfied clauses was sampled from regularly. Now that score caching computes the new number of true literals per clause in each iteration, this set can be easily maintained; if the change in number of true literals was from 1 0, add the clause into the set, and if from 0 1, remove the clause. The initial unsatisfied set can be deduced when computing the initial number of true literals (a clause is in the set if and only if it has 0 true literals). There is an additional bonus from doing this: instead of having to use O(KL) time to check whether T is a satisfying assignment, which is the first non-o(1) noted at the beginning of this section, this step can be now done in O(1) time by simply checking if the unsatisfied set is empty. The final non-o(1) step, determining poss-f lips, is rather fast already (O(N), which is in practice significantly smaller than the other two steps), and was not optimized. 3 Experimental Results 3.1 Setting Max-Flips Finding the correct value for Max-flips is important, as it controls how quickly to give up on a search path. If Max-flips is too low, it is likely that the search has yet to converge to a solution, resulting thrashing behavior where many repeated restarts occur. In contrast, if Max-flips is too high, more time will be wasted on unfruitful paths, such as those starting from poor initial assignments or those stuck in local optima. Gent and Walsh [3] analyzed this problem also, and confirmed the above result, that there indeed exists an optimal value of Max-flips that minimizes the average total flips (a measure of the amount of work done). They also empirically found that max-flips varies with N in an approximately O(N 2 ) fashion. However, they only found the optimal max-flips values for N 100; as we evaluate much larger problems, we need a better way to determine a good max-flips value given N. One note from [3] was that the max-flips optimum is not very sharp, so we crudely chose integral multiples of Max-flips to test on. Also, it was suggested that Max-flips did not have significant dependence on the algorithm, hence we only used HSAT and WalkSAT to determine the optimal Max-flips. Hard random 3-SAT problems (see Section 3.2 below) with N = 50, 100, 150, 200, 250, 300 were chosen, and were evaluated with Max-flips = c N, for 1 c 10. The resulting trend was remarkably simple: HSAT was optimal for N = 50 when c = 1, for N = 100 when c = 2, etc. in a generally linear manner. WalkSAT also had a similar trend, except with double the c values, i.e., N = 100 was optimal when c = 4. For a given N, we therefore have c = N for GSAT and HSAT, and c = N for GSAT with random 50 25 walk and WalkSAT, and set Max-flips = c N. Hence for GSAT and HSAT, Max-flips = N2, 50 and double for the random walk variants. Max-flips therefore does appear to grow with N 2.

5 3.2 Random SAT Table 1: Random 3-SAT Results, N 300 (100 trials per entry) Vars Clauses Algorithm Max-Flips %-Solved Time Flips Tries 50 213 GSAT 50 100% 0.006s 33 4.8 HSAT 50 100% 0.003s 35 2.9 GSAT-W 100 100% 0.008s 64 3.7 WalkSAT 100 100% 0.006s 66 2.9 100 426 GSAT 200 100% 0.08s 150 10.6 HSAT 200 100% 0.02s 114 2.5 GSAT-W 400 100% 0.06s 252 4.3 WalkSAT 400 100% 0.04s 226 3.3 150 639 GSAT 450 100% 1.97s 320 90 HSAT 450 100% 0.25s 278 11 GSAT-W 900 100% 0.84s 618 21 WalkSAT 900 100% 0.76s 648 19 200 852 GSAT 800 100% 15.4s 595 314 HSAT 800 100% 0.6s 506 12 GSAT-W 1600 100% 1.6s 1101 18 WalkSAT 1600 100% 1.6s 1233 18 250 1065 GSAT 1250 85% 82.0s 996 891 HSAT 1250 100% 3.3s 686 36 GSAT-W 2500 100% 15.6s 1893 91 WalkSAT 2500 100% 8.0s 1753 47 300 1278 GSAT 1800 67% 81.6s 1443 471 HSAT 1800 100% 3.8s 1089 24 GSAT-W 3600 99% 17.4s 2701 51 WalkSAT 3600 100% 7.9s 2846 27 We extensively evaluated the four algorithms on hard random 3-SAT formulas of different variable sizes. It has been empirically shown that the hardest problems occur when L 4.26N, hence satisfiable SAT formulas with N = 50, 100, 150, 200, 250, 300, 400,500,600,700,800 that adhere to this clause-variable ratio were used. To reduce the idiosyncrasies of any specific random formula, 10 instances were generated for each N. For N 300, 10 trials were conducted for each instance, giving a total of 100 trials; for the other larger formulas (N 400), only 2 trials were performed per instance due to time constraints, giving a total of 20 trials. The results are shown in Table 1 above and 2 below respectively. All experiments were performed on a Core 2 Quad 2.40GHz computer. In the table, time is the median running time of one instance (up to 600s, after which the algorithm was forced to terminate), flips is the median number of flips on the successful try, and tries is the median number of times the algorithm is restarted before a satisfying assignment is found on the successful try. For example, a tries value of 4.8 should be interpreted as the algorithm it-

erated until Max-flips 4 times (hence completed 4 tries unsuccessfully), and on the 5th try required 0.8*Max-flips number of flips before a satisfying assignment was found. Medians were used to obtain more robust measures of central tendency. For N 300, the algorithms display a stable trend. With the exception of GSAT whose performance began to significantly deteriorate for N 250, all problems were successfully solved. Also, a very stable ranking for the algorithms is seen from the running times, with HSAT always being fastest, followed by WalkSAT, GSAT with random walk, and GSAT. Especially for N 200, GSAT began to take significantly more time. This can be explained by the much higher number of tries that GSAT takes compared to the other three algorithms, which suggests either that Max-flips was chosen poorly, or that GSAT often becomes stuck in bad search paths. The fact that the number of flips on successful tries was close to, but with a stable margin from, the Max-flips value suggests that Max-flips was generally well chosen. Also, the fact that on successful tries GSAT takes number of flips just slightly more than that of HSAT suggests that it is not the case that GSAT chooses particularly bad flips and requires longer paths to reach the satisfying assignment. Rather, it is likely that GSAT often becomes stuck in local optima and hence is unproductive until the next restart. The two random walk variants are able to prevent this by randomly selecting possibly non-locally-optimal variables to flip, which provide opportunities to escape from local optima. The fact that WalkSAT adopts a stronger random walk strategy than GSAT with random walk also explains its lower number of tries and running times. HSAT escapes the local optima problem in a different way, by flipping the variable in poss-flips that was flipped longest ago. Local optima often cause variables to flip back and forth around the optima, hence adopting the HSAT strategy prevents this from happening. Moreover, as variables that have never been flipped before are by definition the ones that were flipped longest ago, the HSAT strategy also allows exploration of new areas of the search space via these new variables, and provides escape from local optima this way. Perhaps because HSAT s escape method is always guided by the greedy hill-climbing heuristic (max score), it does more fruitful searching than random walk, hence requires the least number of flips. Both random walk methods require at least double the number of flips as GSAT and HSAT, suggesting that the consequence of their ability to escape is that more random, unnecessary moves are taken, resulting in slower convergence to satisfying assignments. As for larger formulas (N 400, as shown in Table 2), these trends are not as clear. It is still the case that GSAT deteriorates very quickly; it was only able to solve a minimal number of instances for N = 400, and failed for larger N (and hence is omitted from the table for such cases). We also see that the performance of the other three algorithms begin to deteriorate, although they continue to maintain their rank in terms of % of instances solved. One significant difference is that HSAT no longer appears to dominate the other algorithms stably in terms of running time, but rather that all three algorithms seem to do well in some cases and worse in others, giving close running times (although we must note that relatively few trials were evaluated, hence especially for larger N when fewer instances are solved, the error in the median can potentially be large). While the number of flips maintains the same trends, the number of tries is in general higher than that of lower N, suggesting that increasing N makes it much more difficult for the algorithms to converge to solutions. All the algorithms performed rather poorly when trying to find satisfying assignments before the Max-time of 10 minutes was exceeded for N = 800. 6

7 Table 2: Random 3-SAT Results, N 400 (20 trials per entry) Vars Clauses Algorithm Max-Flips %-Solved Time Flips Tries 400 1704 GSAT 3200 10% 276s 2540 772 HSAT 3200 95% 41s 2053 116 GSAT-W 6400 75% 41s 4109 61 WalkSAT 6400 95% 20s 5112 29 500 2130 HSAT 5000 80% 72s 2758 106 GSAT-W 10000 50% 109s 6664 83 WalkSAT 10000 65% 90s 8444 70 600 2556 HSAT 7200 65% 240s 4502 108 GSAT-W 14400 25% 130s 11456 58 WalkSAT 14400 40% 68s 9181 20 700 2982 HSAT 9800 50% 103s 5327 44 GSAT-W 19600 30% 100s 13807 20 WalkSAT 19600 45% 238s 14829 52 800 3408 HSAT 12800 15% 102s 9202 39 GSAT-W 25600 10% 439s 21323 84 WalkSAT 25600 30% 107s 19721 19 3.3 Other Benchmarks 3.3.1 Hirsch Formulas This set of benchmarks is known to be difficult to solve despite their low number of variables (N 300). The reason for this difficulty is because the formulas are constructed backwards from the truth assignments in order to have certain properties that make the formula difficult to solve; in particular, variables that appear together in one clause are not allowed to appear together in other clauses, which reduces the overlap in the amount of work that can be done by single variable flips, and makes the greedy heuristics less effective as flips have less effect. The results of evaluating the algorithms on 10 instances with 2 trials per instance is shown in Table 6 (at end). Again, as few trials were performed due to time constraints, individual figures are less meaningful; however, the general trend can still be seen. In terms of % solved, HSAT is still clearly dominant, solving at least as many trials per instance compared to other algorithms, and for two instances is the only algorithm that can find a satisfying truth assignment. Also, HSAT has comparable and often better running times. GSAT again has the worst performance in terms of these two metrics, obtaining success only on 5 of 20 trials. The random walk algorithms work occasionally, often with high running times and in many instances with number of flips close to Max-flips, suggesting that either Max-flips is not well chosen (should be higher), or that convergence is not fast enough due to the difficulty of the search space. The fact that number of tries is very high for all instances suggests that the search space is indeed very difficult, in that search paths often fail even with the ability to escape local optima, hence implying that solutions are very hidden.

8 3.3.2 Spin-Glass Models These SAT problems are exceptionally difficult, and no instance was solved by any of the algorithms. Although these are also 3-SAT problems with N 300-500, their difficulty comes from the intricate structure of its clauses. The clauses combined essentially exhibit a cyclic structure over the variables and connects the variables in order. The challenge this poses is that once an assignment to a certain variable along the chain is fixed, the assignments of all other variables must also fit exactly in place, which clearly is difficult given the randomness and greediness involved in the algorithms we evaluate. It is therefore too difficult in these problems for the algorithms to find a converging path to a satisfying assignment. 3.3.3 Quasigroup Completion Table 3: Quasigroup Completion Benchmarks on GSAT-W (2 trials per entry) Name Vars Clauses Max-Flips %-Solved Time Flips Tries qcp-000001 2579 25079 266049 100% 149s 109467 0.42 qcp-000003 4109 43318 675355 50% 402s 197528 0.29 qcp-000042 546 4060 11924 100% 0.95s 3931 0.36 qcp-000150 1490 13511 88804 100% 26s 38687 0.45 qcp-000996 4031 42893 649958 100% 326s 207996 0.32 qwh-000124 4138 46161 684921 100% 386s 166790 0.25 qwh-000641 4061 41699 659668 100% 481s 242800 0.37 qwh-000924 2299 18458 211416 50% 157s 168608 0.81 qwh-000986 3628 35339 526495 100% 289s 164246 0.31 These problems have properties that make it difficult for greedy heuristics to work. Of the 20 instances evaluated (with 2 trials per instance), only GSAT with random walk was able to find a satisfying assignment in any instance, and it was able to do so in only 9 of the 20. The difficulty in these problems lies in their structure; approximately 90% of the clauses are purely negative 2-variable clauses (i.e., both literals contain ), and the rest of the clauses are purely positive. Since there are so many more negative clauses, flipping a variable s truth values to false would have a high score because it can cause many negative clauses to be satisfied, even if the single positive clause that contained the variable becomes unsatisfied. This would likely cause many negative truth assignments; however, as there are purely positive clauses, some variables must have a positive assignment. This is difficult to obtain during a greedy hill-climbing procedure, since flipping a truth value from false to true causes many 2-variable negative clauses to rely on the other variable as its support, which prevents future flips from false to true, and hence blocks the correct combinations of true variables from occurring. The local optima in this case is therefore very strong, or in other words, the true satisfying assignment is very hidden. An interesting supporting case for this is the qcp-000042 problem, which is considerably smaller than the other problems. All algorithms can successfully solve this instance, but both

9 GSAT and HSAT take much longer ( 300s, compared to 0.95s for GSAT with random walk), suggesting that it is very difficult to just rely on hill-climbing heuristics to find solutions. WalkSAT performed only slightly slower on this instance, so it is unclear why only GSAT with random walk and not WalkSAT is successful on other instances. It is likely that WalkSAT is capable of converging to the solution, but is not quick enough due to the size of the problem (with N 4000 in many cases). This is supported by the fact that for GSAT with random walk, even though all solved cases occurred in the first try and flips used was not near Max-flips, the running time for many cases was close to Max-time. Hence because WalkSAT is slightly slower, it may have taken longer than Max-time to converge. 3.3.4 Others Table 4: Other Problems (5 trials per entry) Name Vars Clauses Algorithm Max-Flips %-Solved Time Flips Tries blocksworld.a 459 4675 GSAT 4213 80% 229s 572 168 HSAT 4213 80% 262s 309 178 GSAT-W 8427 100% 17s 4605 6 WalkSAT 8427 100% 4s 4577 3 logistics.a 828 6718 WalkSAT 27423 100% 257s 16736 43.6 logistics.b 843 7301 WalkSAT 28425 80% 184s 20427 28.8 logistics.c 1141 10719 WalkSAT 52075 80% 261s 41384 15.9 logistics.d 4713 21991 WalkSAT 888494 80% 164s 217767 0.3 graphcoloring.a 500 3100 GSAT 5000 100% 5.61s 1154 4.9 HSAT 5000 100% 5.97s 418 5.1 GSAT-W 10000 100% 0.87s 4112 0.4 WalkSAT 10000 100% 0.80s 3433 0.3 graphcoloring.b 500 3100 GSAT 5000 100% 2.56s 2913 2.2 HSAT 5000 100% 1.27s 485 1.1 GSAT-W 10000 100% 1.62s 5606 0.7 WalkSAT 10000 100% 0.82s 3647 0.4 graphcoloring.c 500 3100 GSAT 5000 100% 13.5s 3798 13 HSAT 5000 100% 6.0s 513 6 GSAT-W 10000 100% 2.8s 6081 3 WalkSAT 10000 100% 20.1s 5228 10 graphcoloring.d 500 3100 GSAT 5000 100% 2.02s 3509 1.8 HSAT 5000 100% 0.29s 844 0.3 GSAT-W 10000 100% 5.68s 5819 2.6 WalkSAT 10000 100% 4.15s 8638 1.9 The algorithms were evaluated on several other problems, including blocks world planning, logistics, and graph coloring. All of these problems had structure similar to the quasi-

10 group completion problems in Section 3.3.3. A total of 10 instances with 5 trials per instance was evaluated, with results shown in Table 4. The problem blocksworld.b and non-walksat instances of logistics are omitted as they were unsolvable. The solvable instances of blocksworld and logistics had behavior similar to that of the quasigroup completion problems, where GSAT and HSAT would take very long, with a very high number of tries indicating that it was very difficult to converge to a satisfying assignment. In contrast, random walk methods, and especially WalkSAT, performed much better; WalkSAT was the only algorithm that could solve the logistics problems. The low number of tries for GSAT with random walk and WalkSAT for blockworlds.a suggest that random walk strategies, despite having observably slower convergence so far, can often converge successfully to the solution, possibly due to their superior ability to escape local optima. However, it is interesting that in the logistics problems, apart from logistics.d, WalkSAT had a relatively high number of tries, indicating that perhaps Max-flips needs to be more carefully tuned to the specific problem. It is also interesting to see that in quasigroup completion only GSAT with random walk was successful, whereas here only WalkSAT was successful; the problems on which either algorithm is superior is still unclear. One surprising result is that although graph coloring problems have similar structure to the quasigroup completion problems, all algorithms were able to find satisfying truth assignments on all trials, and in a comparatively low running time and low number of tries. This is most likely due to a slight difference in the structure of the problem; whereas in quasigroup completion the purely positive clauses were relatively long and occupied approximately 10% of the clauses, in graph coloring the purely positive clauses are short (5 literals) and occupy only 3% of the clauses. The instances are therefore less problematic in terms of local optima, which can be seen by the ability for GSAT and HSAT to solve the problem with comparable speeds to the random walk algorithms. Still, local optima problems exist and are significant, which can be seen by the large difference in number of flips between GSAT and HSAT; HSAT s better method to escape local optima allowed it to significantly exceed GSAT s performance, suggesting that local optima problems are still prevalent. 3.4 Score Caching To evaluate the effectiveness of the score caching optimization, two versions of HSAT, one with score caching and one without, were compared. Both versions were tested on the random 3-SAT formulas from Section 3.2 for N = 50, 100, 150, 200, 250, 300, again with 10 instances of each N, and with 2 trials per instance. The results are shown in Table 5. Note that since the method in solving SAT has not been changed (only a speedup optimization was introduced), the two algorithms should perform similarly in terms of flips and tries, which was indeed the case and these fields are therefore omitted. Clearly, score caching improves the performance of HSAT significantly, with a speedup of 50 times in terms of running time. Also, the % of instances that can be solved quickly decrease as N (and the number of tries) increases, as Max-time is often exceeded. For N 200, this deterioration is significant, and by N = 400, HSAT without score caching cannot find a satisfying assignment for any instance. Score caching is therefore a very fast and effective optimization.

11 Table 5: HSAT With and Without Score Caching (20 trials per entry) Cache No Cache Vars Clauses Max-Flips %-Solved Time %-Solved Time 50 213 50 100% 0.003s 100% 0.07s 100 426 200 100% 0.02s 100% 1.3s 150 639 450 100% 0.25s 100% 33s 200 852 800 100% 0.6s 90% 96s 250 1065 1250 100% 3.3s 50% 187s 300 1278 1800 100% 3.8s 35% 181s 4 Conclusions We evaluated four SAT algorithms GSAT, HSAT, GSAT with random walk, and WalkSAT against a variety of SAT problems. For hard random 3-SAT formulas, and especially for N 300, a stable trend was found, where HSAT outperformed the other algorithms in running time, number of flips, and number of tries. This was followed by WalkSAT, GSAT with random walk, and GSAT. For greater values of N, HSAT continued to outperform the other algorithms, though not as stably and significantly. For other benchmarks, different trends were found. When the problems had more structure, such as in quasigroup completion, blocks world planning, logistics, and graph coloring problems, GSAT with random walk and WalkSAT significantly outperformed the former two greedy local search algorithms. The performance of the evaluated algorithms therefore depend greatly on the nature of the SAT problem; however, in all cases, the variants of GSAT all outperformed GSAT itself. References [1] M. Davis and H. Putnam, A computing procedure for quantification theory, JACM, vol. 7, no. 3, 1960. [2] B. Selman, H. Levesque, and D. Mitchell, A new method for solving hard satisfiability problems, in AAAI, 1992. [3] I. Gent and T. Walsh, Towards an understanding of hill-climbing procedures for sat, in AAAI, 1993. [4] B. Selman, H. Kautz, and B. Cohen, Noise strategies for improving local search, in AAAI, 1994.

12 Table 6: Hirsch Benchmarks (2 trials per entry) Name Vars Clauses Algorithm Max-Flips %-Solved Time Flips Tries hgen2-000008 312 1092 GSAT 1946 0% - - - HSAT 1946 50% 101s 421 649 GSAT-W 3893 50% 169s 1863 574 WalkSAT 3893 0% - - - hgen2-000032 260 910 GSAT 1352 100% 256s 1105 2790 HSAT 1352 100% 45s 1037 489 GSAT-W 2704 100% 319s 1753 1784 WalkSAT 2704 100% 138s 2284 788 hgen2-000033 278 973 GSAT 1545 0% - - - HSAT 1545 100% 116s 942 1093 GSAT-W 3091 0% - - - WalkSAT 3091 50% 39s 2654 198 hgen2-000041 307 1074 GSAT 1884 0% - - - HSAT 1884 100% 77s 1097 514 GSAT-W 3769 0% - - - WalkSAT 3769 50% 116s 3488 408 hgen2-000042 260 910 GSAT 1352 0% - - - HSAT 1352 100% 269s 491 2805 GSAT-W 2704 50% 354s 1358 2029 WalkSAT 2704 0% - - - hgen2-000047 306 1071 GSAT 1872 50% 429s 1440 2757 HSAT 1872 100% 217s 847 1461 GSAT-W 3745 50% 249s 2405 880 WalkSAT 3745 50% 155s 2808 548 hgen2-000053 300 1050 GSAT 1800 0% - - - HSAT 1800 100% 226s 731 1517 GSAT-W 3600 50% 211s 2424 776 WalkSAT 3600 50% 301s 3368 1112 hgen2-000061 315 1102 GSAT 1984 0% - - - HSAT 1984 50% 543s 1591 3350 GSAT-W 3969 0% - - - WalkSAT 3969 0% - - - hgen2-000062 278 973 GSAT 1545 0% - - - HSAT 1545 100% 280s 566 2643 GSAT-W 3091 0% - - - WalkSAT 3091 0% - - - hgen2-000081 274 959 GSAT 1501 100% 340s 1107 3202 HSAT 1501 100% 20s 837 193 GSAT-W 3003 100% 117s 1967 602 WalkSAT 3003 100% 317s 2119 1634