On the impact of small-world on local search Andrea Roli andrea.roli@unibo.it DEIS Università degli Studi di Bologna Campus of Cesena p. 1
Motivation The impact of structure whatever it is on search algorithms is dramatically relevant Identify most difficult instances (for a given technique) Understand why an instance is difficult Exploit this bit of information to choose the best solver, or a combination of solvers Evaluate the quality of benchmarks p. 2
Goal Previous work [Walsh, 1999] CSP instances defined over small-world graphs are harder to solve for complete algorithms Question: What about local search behavior on small-world instances? p. 3
Outline Background: Complex networks Structure in CSPs Small-world SAT instances Experimental results Discussion p. 4
Complex networks p. 5
Complex networks System topology is crucial for understanding its dynamics Graph theory provides useful models Complex networks: emerging research field p. 6
Graphs as structure abstraction Entities represented as graph nodes relations arcs Node: either one entity or an entire subsystem p. 7
Main characteristics Node degree (distribution, average, etc.) Diameter, characteristic path length et similia Clustering (i.e., cliquishness tendency) p. 8
Random graphs First developed model for system structure Several important applications Random graphs fail to represent social and biological systems p. 9
Random graphs Node degree distribution: Poissonian (approx Normal) Characteristic path length: low Clustering: low p. 10
Random graphs p. 11
Scale-free networks Relations among individuals in a society (e.g., scientific collaborations) Web pages structure Internet structure... p. 12
Scale-free networks Node degree distribution: nodes with degree k k γ (γ parameter) Very few hubs (but not negligible) and many nodes with few connections Robust wrt random failures Sensitive to attacks p. 13
Scale-free networks p. 14
Scale-free networks formation Growth: older nodes has on average a higher number of connections Preferential attachment: new nodes are more likely to connect to nodes with higher degree (probability proportional to the degree) Model variants that take into account also the fitness p. 15
Small-world Any pair of nodes connected by few hops (short characteristic path length) High degree of cliquishness (high clustering coefficient) Examples: Social networks World Wide Web Scientific collaboration network C.Elegans worm neural network p. 16
Characteristic length Informally: average path length between any pair of nodes. Random graphs short Grid graphs long p. 17
Clustering b a c Informally: it quantifies the probability that, given node a connected to b and c, there is an edge between b and c Random graphs low Grid graphs high p. 18
Structure Diverse meanings Structure vs. random Usually real world problems are said to be structured Attempts to define quantitative measures (entropy, compression ratio, etc.) Graph representation of relations among problem entities p. 19
SATgraphs (a b) (b d) (c d e) a c e b d p. 20
Remember the initial goal.. Previous work [Walsh, 1999] CSP instances defined over small-world graphs are harder to solve for complete algorithms Question: What about local search behavior on small-world instances? p. 21
Experimental issues Small-world SAT instances Procedure to generate instances Measuring small-world property Attacking the benchmark with local search algorithms GSAT WalkSAT ILS-SAT p. 22
Small-world SAT Morphing between a lattice SAT instance and a random SAT instance. [Gent et al., 1999] p. 23
Small-world SAT Length, clustering and proximity ratio (normalized ratio clustering/length) Char. Length, Clustering and Proximity 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Smallworld parameters (500 variables, 1500 clauses) characteristic length (norm.) clustering (norm.) proximity ratio (rescaled) 0 1 10 100 1000 10000 Number of clauses from RandomSAT p. 24
Complete algorithm 1600 100 variables, 300 clauses 5000 200 variables, 600 clauses Search cost 1400 1200 1000 800 600 400 200 0 Search cost 4500 4000 3500 3000 2500 2000 1500 1000 500 0-200 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Proximity ratio -500 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 Proximity ratio 20000 500 variables, 1500 clauses 45000 800 variables, 2400 clauses 18000 40000 Search cost 16000 14000 12000 10000 8000 6000 4000 2000 Search cost 35000 30000 25000 20000 15000 10000 5000 0-5000 0 0 1 2 3 4 5 6 7 8 9 10 Proximity ratio -10000 0 1 2 3 4 5 6 7 8 9 10 Proximity ratio p. 25
Outline of the results No common behavior across different algorithms Mild tendency of small-world and hardness correlation p. 26
GSAT 1e+07 100 variables, 300 clauses 600 200 variables, 600 clauses 1e+06 500 Median iterations (log) 100000 10000 Success rate 400 300 200 1000 100 100 0 50 100 150 200 250 300 Number of clauses fom RandomSAT 0 0 100 200 300 400 500 600 700 Number of clauses from RandomSAT 1000 500 variables, 1500 clauses 500 800 variables, 2400 clauses 900 450 800 400 700 350 Success rate 600 500 400 Success rate 300 250 200 300 150 200 100 100 50 0 0 200 400 600 800 1000 1200 1400 1600 Number of clauses from RandomSAT 0 0 200 400 600 800 1000 1200 1400 1600 Number of clauses from RandomSAT p. 27
WalkSAT 400 100 variables, 300 clauses 100000 200 variables, 600 clauses 350 Median iterations 300 250 200 Median iterations (log) 10000 1000 150 100 0 50 100 150 200 250 300 Number of clauses from RandomSAT 100 0 100 200 300 400 500 600 Number of clauses from RandomSAT 10000 500 variables, 1500 clauses 10000 800 variables, 2400 clauses Median iterations (log) 1000 Median iterations (log) 1000 100 0 200 400 600 800 1000 1200 1400 1600 Number of clauses from RandomSAT 100 0 500 1000 1500 2000 2500 Number of clauses from RandomSAT p. 28
ILS-SAT 3000 100 variables, 300 clauses 1e+07 200 variables, 600 clauses 2500 1e+06 Median iterations 2000 1500 1000 Median iterations (log) 100000 10000 500 1000 0 0 50 100 150 200 250 300 Number of clauses from RandomSAT 100 0 100 200 300 400 500 600 Number of clauses from RandomSAT 1e+06 500 variables, 1500 clauses 1e+07 800 variables, 2400 clauses Median iterations (log) 100000 10000 Median iterations (log) 1e+06 100000 10000 1000 0 200 400 600 800 1000 1200 1400 1600 Number of clauses from RandomSAT 1000 0 500 1000 1500 2000 2500 Number of clauses from RandomSAT p. 29
Discussion Many small-world/lattice SAT instances are harder for GSAT and ILS-SAT WalkSAT exhibits a peculiar behavior The relation between SATgraph and search landscape plays a very important role p. 30
Future work Connections between constraint graph properties and search space characteristics Exploring strengths and weaknesses of the heuristics w.r.t. constraint graph properties Relation between problem encoding and graph properties Alternative formulations to study the structure of a problem can be used (e.g., weighted graphs) p. 31