Local search. Heuristic algorithms. Giovanni Righini. University of Milan Department of Computer Science (Crema)

Similar documents
Constructive and destructive algorithms

Computational complexity

Constructive meta-heuristics

Polynomial time approximation algorithms

Algorithms for Integer Programming

ACO and other (meta)heuristics for CO

An iteration of Branch and Bound One iteration of Branch and Bound consists of the following four steps: Some definitions. Branch and Bound.

Simplicial Global Optimization

Adaptive Large Neighborhood Search

Methods and Models for Combinatorial Optimization Heuristis for Combinatorial Optimization

B553 Lecture 12: Global Optimization

Complete Local Search with Memory

and 6.855J. The Successive Shortest Path Algorithm and the Capacity Scaling Algorithm for the Minimum Cost Flow Problem

First-improvement vs. Best-improvement Local Optima Networks of NK Landscapes

Outline. Construction Heuristics for CVRP. Outline DMP204 SCHEDULING, TIMETABLING AND ROUTING

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation

Theorem 2.9: nearest addition algorithm

Algorithms and Experimental Study for the Traveling Salesman Problem of Second Order. Gerold Jäger

Consistency and Set Intersection

Overview. H. R. Alvarez A., Ph. D.

Optimization Techniques for Design Space Exploration

Parallel Computing in Combinatorial Optimization

Heuristis for Combinatorial Optimization

Heuristic Optimisation

Methods and Models for Combinatorial Optimization Exact methods for the Traveling Salesman Problem

Construction Heuristics and Local Search Methods for VRP/VRPTW

Local Search. Outline DM811 HEURISTICS AND LOCAL SEARCH ALGORITHMS FOR COMBINATORIAL OPTIMZATION. Definition: Local Search Algorithm.

a local optimum is encountered in such a way that further improvement steps become possible.

Heuristis for Combinatorial Optimization

Module 4. Constraint satisfaction problems. Version 2 CSE IIT, Kharagpur

Cluster Analysis. Angela Montanari and Laura Anderlucci

3 INTEGER LINEAR PROGRAMMING

Parallel Machine and Flow Shop Models

Coping with the Limitations of Algorithm Power Exact Solution Strategies Backtracking Backtracking : A Scenario

5.4 Pure Minimal Cost Flow

Outline. No Free Lunch Theorems SMTWTP. Outline DM812 METAHEURISTICS

Chapter S:II. II. Search Space Representation

Effective probabilistic stopping rules for randomized metaheuristics: GRASP implementations

LECTURES 3 and 4: Flows and Matchings

Metaheuristics : from Design to Implementation

Fast algorithms for max independent set

Branch-and-bound: an example

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

Geometric Steiner Trees

The PRAM model. A. V. Gerbessiotis CIS 485/Spring 1999 Handout 2 Week 2

Material handling and Transportation in Logistics. Paolo Detti Dipartimento di Ingegneria dell Informazione e Scienze Matematiche Università di Siena

Graph. Vertex. edge. Directed Graph. Undirected Graph

Network Design and Optimization course

A Topography-Preserving Latent Variable Model with Learning Metrics

Evolutionary tree reconstruction (Chapter 10)

Copyright 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2 nd ed., Ch.

EXERCISES SHORTEST PATHS: APPLICATIONS, OPTIMIZATION, VARIATIONS, AND SOLVING THE CONSTRAINED SHORTEST PATH PROBLEM. 1 Applications and Modelling

Decreasing the Diameter of Bounded Degree Graphs

Mathematical and Algorithmic Foundations Linear Programming and Matchings

GRASP. Greedy Randomized Adaptive. Search Procedure

Exact Algorithms for NP-hard problems

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph.

Computational problems. Lecture 2: Combinatorial search and optimisation problems. Computational problems. Examples. Example

Introduction to Approximation Algorithms

Rollout Algorithms for Discrete Optimization: A Survey

1. Lecture notes on bipartite matching February 4th,

Chapter Design Techniques for Approximation Algorithms

Mathematics of Networks II

Data Mining Chapter 8: Search and Optimization Methods Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

6. Lecture notes on matroid intersection

COLUMN GENERATION IN LINEAR PROGRAMMING

Exercise set 2 Solutions

Combinatorial Optimization Lab No. 10 Traveling Salesman Problem

Conflict Graphs for Combinatorial Optimization Problems

Branch-price-and-cut for vehicle routing. Guy Desaulniers

Hybridization EVOLUTIONARY COMPUTING. Reasons for Hybridization - 1. Naming. Reasons for Hybridization - 3. Reasons for Hybridization - 2

Computer Science 385 Design and Analysis of Algorithms Siena College Spring Topic Notes: Brute-Force Algorithms

2.3 Optimal paths. Optimal (shortest or longest) paths have a wide range of applications:

AC64/AT64 DESIGN & ANALYSIS OF ALGORITHMS DEC 2014

TABU search and Iterated Local Search classical OR methods

A Fast Taboo Search Algorithm for the Job Shop Scheduling Problem

Module 6 NP-Complete Problems and Heuristics

Outline. TABU search and Iterated Local Search classical OR methods. Traveling Salesman Problem (TSP) 2-opt

Lecture notes on the simplex method September We will present an algorithm to solve linear programs of the form. maximize.

New algorithm for analyzing performance of neighborhood strategies in solving job shop scheduling problems

Interactive segmentation, Combinatorial optimization. Filip Malmberg

1. Lecture notes on bipartite matching

EE 701 ROBOT VISION. Segmentation

Search Algorithms. IE 496 Lecture 17

Canonical Forms and Algorithms for Steiner Trees in Uniform Orientation Metrics

An Improved Hybrid Genetic Algorithm for the Generalized Assignment Problem

x ji = s i, i N, (1.1)

Lecture 25: Bezier Subdivision. And he took unto him all these, and divided them in the midst, and laid each piece one against another: Genesis 15:10

Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret

Variable Neighborhood Search

Today. Golden section, discussion of error Newton s method. Newton s method, steepest descent, conjugate gradient

(Stochastic) Local Search Algorithms

Constraint Satisfaction Problems

NP Completeness. Andreas Klappenecker [partially based on slides by Jennifer Welch]

Crash-Starting the Simplex Method

Localized and Incremental Monitoring of Reverse Nearest Neighbor Queries in Wireless Sensor Networks 1

Image representation. 1. Introduction

A Tabu Search Heuristic for the Generalized Traveling Salesman Problem

Programming, numerics and optimization

Efficient Edge-Swapping Heuristics for the Reload Cost Spanning Tree Problem

Transcription:

Local search Heuristic algorithms Giovanni Righini University of Milan Department of Computer Science (Crema)

Exchange algorithms In Combinatorial Optimization every solution x is a subset of E An exchange heuristic iteratively updates a subset x (t) 1. it starts from a feasible solution x (0) obtained in some way (often with a constructive heuristic) 2. it exchanges elements in the current solution with elements not in it yielding other feasible solutions x A,D = x A\D with different A E \ x and D x 3. at each step t, it selects which elements must be added and deleted according to a suitable criterion (A, D ) = arg minφ(x, A, D) A,D 4. it generates the new current solution x (t+1) := x (t) A \ D 5. when a suitable end test is satisfied, it terminates; otherwise, it goes back to step 2.

Neighborhood An exchange heuristic is defined by: the set of subsets A and D that can be used, i.e. the subset of solutions that can be generated with an exchange; the selection criterion φ(x, A, D). The neighborhood N : X 2 X is a function associating a subset of neighbor solutions N(x) X with each feasible solution x X. One can define a search graph in which nodes represent feasible solutions; arcs link each solution x with those in its neighborhood N(x). Given a search graph a run of the algorithm corresponds to a path the traversal of an arc is called move, because it transforms a solution in another one by moving some elements

Distance-based neighborhoods Every solution x X can be represented with its incidence vector: { 1 if i x x i = 0 if i E \ x The Hamming distance between two incidence vectors x and x is the numbero of components in which they are different: d H (x, x ) = i E x i x i With reference to the subsets this means x \ x + x \ x. The set of solutions with Hamming distance from x within a given threshold is a possible definition of a neighborhood (parameterized on the threshold, k): N Hk (x) = {x X : d H (x, x ) k}

An example: the KP The instance of the KP with E = {1, 2, 3, 4}, w = [ 5 4 3 2 ] and W = 10, has the following solutions where the subsets {1, 2, 3, 4}, {1, 2, 3} and {1, 2, 4} are not feasible. The solution x = {1, 3, 4} (in blue) has a neighborhood N H2 (x) of 7 elements (in pink). The subsets in black do not belong to the neighborhood, because their Hamming distance from x is larger than 2.

Operations-defined neighborhoods Another common definition of neighborhood is oeprational. It is obtained by defining a set O of operations that can be done on the solutions of the problem; a set of solutions generated by doing the operations of O n x. N O (x) = {x X : o O : o(x) = x } For the KP, one can define O as insertion of an element of E \ x in x; deletion of an element of x from x; exchange of an element in x with an element in E \ x. The resulting neighborhood N O is related to the distance-based neighborhoods but it does not coincide with any of them. N 1 N O N 2 These neighborhoods can be parameterized by executing sequences of k operations of O instead of one, as with distance-based neighborhoods.

Differences between neighborhoods In general, operations-based neighborhoods produce solutions at different Hamming distances. For the TSP one can define a neighborhood N S1 with the solutions that can be obtained by exchanging two vertices in the sequence of visits. The solution x = (3, 1, 4, 5, 2) has neighborhood: N S1 (x) = {(1, 3, 4, 5, 2),(4, 1, 3, 5, 2),(5, 1, 4, 3, 2),(2, 1, 4, 5, 3),(3, 4, 1, 5, 2), (3, 5, 4, 1, 2),(3, 2, 4, 5, 1),(3, 1, 5, 4, 2),(3, 1, 2, 5, 4),(3, 1, 4, 2, 5)} With respect to x three arcs change if the two vertices are adjacent, 4 arcs change otherwise.

Relations between distance-based and operations-based neighborhoods Sometimes the neighborhoods defined in the two ways coincide: for the MDP: N H2 with solutions at distance 2; N S1 defined by the exchange of an element; for the BPP: N H2 with solutions at distance 2; N T1 defined by moving an item to a different bin; and many other examples are possible... This is typical with solutions where the cardinality is fixed: one runs a sequence of k exchanges; k elements enter and k elements leave the solution; the Hamming distance between the first and the last solution is 2k.

Different neighbors for a same problem: the CMST A same problem may allow for different operations-based neighborhoods. In the CMST one can exchange edges: (i, j) leaves, (i, n) enters; exchange vertices: n is moved from subtree 2 to subtree 1 (recomputing the edges to reconnect all subtrees at minimum cost):

Different neighbors for a same problem: the PMSP For the PMSP one can define a transfer neighborhood N T1, based on the set T 1 of job moves from a machine to another one: an exchange neighborhood N S1, based on the set S 1 of job exchanges between two different machines (one job for each machine)

Connectivity of the search space An exchange heuristic can always find an optimal solution only if at least one optimal solution is reachable from any initial solution. One says the the search graph is weakly connected to the optimum when for each solution x X a path from x to x exists. Since x is unknown, a stronger condition is often used: the search graph is strongly connected when for each pair of solutions x, y X a path from x to y exists. An exchange heuristic should guarantee one of such conditions. This is not always possible: in the MDP, the neighborhood N S1 allows to connect any pair of solutions is at most k steps in the KP and the SCP, no neighborhood N Sk guarantees this The feasible solutions may have any cardinality if we allow for deletions (in the KP) and insertions (in the SCP), then the search graph is connected.

Connectivity of the search space If feasibility is defined in a sophisticated way, owing to the many constraints of the problem, then deletions, insertions and exchanges od elements may be insufficient: infeasible subsets may interrupt the paths between pairs of feasible solutions. w = 1 w = 1 w = 1 e f g w = 1 w = 1 e f g w = 1 w = 1 e f g w = 1 w = 1 a b c d a b c d a b c d w = 1 w = 2 w = 1 w = 1 w = 1 w = 2 w = 1 w = 1 w = 1 w = 2 w = 1 w = 1 r r r Given W = 8, tehre are three feasible solutions, all with two subtrees of weight 4: x = {(r, a),(a, b),(b, e),(r, d),(c, d),(d, g),(f, g)} x = {(r, a),(a, e),(e, f),(f, g),(r, d),(c, d),(b, c)} x = {(r, a),(a, b),(b, c),(r, d),(d, g),(f, g),(e, f)} The three solutions can be reached from one another only by exchanging two edges at a time; exchanging one edge, only infeasible solutions are reached.

Steepest descent heuristic The selection criterion φ(x) of the new solution in the neighborhood of the current solution is typically the objective function: at each step, the heuristic moves from the current solution to the best one in in its neighborhood. To avoid cycling, one accepts only strictly improving moves. Algorithm SteepestDescent ( I, x (0)) x := x (0) ; Fine := false; While Fine = false do x := arg min x N(x) f (x); If f (x ) f (x) then Fine := true; else x := x ; EndWhile; Return (x, f (x));

Local and global optimality A steepest descent heuristic terminates when it finds a locally optimal solution, that is a solution x X such that (assuming minimization) z( x) z(x) for each x X A globally optimal solution is also locally optimal, but the viceversa in not true in general: X X N X f(x) f(x) f(x) f(x*) N (x) X x* x X x x

Exact neighborhood An exact neighborhood is a neighborhood function N : X 2 X such that every local optimum is also a global optimum. X N = X A trivial case occurs when the neighborhood of every solution coincides with the whole feasible region (N(x) = X for each x X). Exact neighborhoods are extremely rare This is useless: too large to explore. the only relevant case is the exchange of basic and non-basic variables used by the simplex algorithm for linear programming problems. In general, a steepest descent heuristic finds a local optimum, not a global optimum. Its effectiveness depends on the properties of the search graph and the objective.

Some relevant properties are: Properties of the search graph the size of the search space X the connectivity of the search space (or the search graph) the diameter of the search graph, i.e. the number of arcs of the longest shortest path in it. For instance, for the symmetric TSP on complete graphs: the search space contains X = (n 1)! solutions the vertex exchange neighborhood contains ( ) n 2 = n(n 1) 2 solutions the diameter of the search graph is n 2, because any solution can be transformed in any other by at most n 2 exchanges. For instance, x = (1, 5, 4, 2, 3) becomes x = (1, 2, 3, 4, 5) in 3 steps x = (1, 5, 4, 2, 3) (1, 2, 4, 5, 3) (1, 2, 3, 5, 4) (1, 2, 3, 4, 5) = x

Other relevant properties: Properties of the search graph the density of globally optimal solutions ( X X ) and locally optimal solutions ( X X ): if local optima are many, it is difficult to find global optima. the quality of local optima compared with global optima (δ( x) = z( x) z(x ) z(x ) ), possibly described by an SQD diagram: if local optima are good, it may be less important to find global optima. the distribution of local optima in the search space: if local optima are close by, it is not necessary to explore the whole space. The exact evaluation of these indicators would require an exhaustive exploration of the search space. In practice, we limit ourselves to probe it: this analysis may require a lot of time; may provide misleading results.

Example: the TSP Typical results with the TSP on complete graphs with Euclidean costs: the average Hamming distance between two local optima is n: local optima are concentrated in a small sub-region of X; the average Hamming distance between to local optima is larger than between local and global optima: global optima are likely to in between local optima; the FDC (Fitness-Distance Correlation) diagram links the quality δ( x) with the distance from the global optima d H ( x, X ): better local optima are closer to global optima;

Fitness-Distance Correlation If the correlation between quality and closeness to global optima is strong, it is more convenient to search for good initial solutions, because they guide the local search to good local optima it is more better to intensify than to diversify. On the contrary, if the correlation is weak a good initialization is less important; it is better to diversify than to intensify. This happens, for instance, with the Quadratic Assignment Problem (QAP)

Landscape A landscape is a triple (X, N, z), where X is the search space, or the feasible region; N : X 2 X is the neighborhood function; z : X N is the objective function. One can see the search graph as a graph weighted on the vertices with the objective. The effectiveness of exchange heuristics depends on the landscape. Rugged landscapes imply many local optima and hence less effective heuristics.

Different types of landscapes There is a wide variety of landscapes.

Autocorrelation coefficient The complexity of a landscape can be estimated empirically 1. doing a random walk on the search graph 2. determining the sequence of objective values z (1),...,z (tmax) 3. computing their average value z = tmax z (t) t=1 4. computing the empirical autocorrelation coefficient r(i) = tmax i t=1 (z (t) z)(z (t+i) z) t max i tmax t=1 (z(t) z) 2 t max It is a function of i that starts from r(0) = 1 and usually decreases. If r(i) remains 1, the landscape is smooth: the neighbor solutions have values similar to the current one there are few local optima the steepest descent heuristic is effective. If r rapidly changes, the landscape is rugged: the neighbor solutions have values quite different from the current one there are many local optima the steepest descent heuristic is not so effective.

Plateaux One can analyse the search graph dividing it into objective levels: a plateau of value z is a subset of solutions of value z that are adjacent in the search graph. Large plateaux hamper the selection of the move, because they make it dependent on the order in which the neighbor solutions are visited. Hence a too smooth landscape is not an advantage! Example (PMSP): all transfers and exchanges between machines 1 and 3 leave the objective function value unchanged (the other moves worsen it.)

Attraction basins An alternative subdivision of the search graph is based on the concept of attraction basin of a local optimum x. It is the set of solutions x (0) X such that the steepest descent heuristic starting from x (0) terminates in x. The steepest descent heuristic is effective if attraction basins are few and large (especially when global optima have larger attraction basins); ineffective if attraction basins are many and small (especially if global optima have smaller attraction basins).

Complexity Algorithm SteepestDescent ( I, x (0)) x := x (0) ; Stop := false; While Fine = false do x := arg min x N(x) z(x); If z(x ) z(x) then Stop := true; else x := x ; EndWhile; Return (x, z(x)); The complexity of the steepest descent heuristic depends on 1. the number of steps: this depends on the structure of the search graph (width of the attraction basins), which is difficult to estimate a priori; 2. the selection of a best solution in the neighborhood: this depends on ho w the search is done.

Two main strategies are used: Exploring the neighborhood 1. exhaustive search: all neighbor solutions are evaluated; the complexity of each iteration is the product of the number of neighbor solutions ( N(x) ) the cost for evaluating each of them (γ N ( E, x)) Sometimes it is not easy to evaluate only neighbor solutions: one visits a superset of the neighborhood for each element the feasibility is checked for the feasible elements the cost is evaluated 2. efficient exploration of the neighborhood: instead of visiting the whole neighborhood, one finds the optimal neighbor solution by solving an auxiliary problem. Only some special neighborhoods allow for this.

Exhaustive exploration of the neighborhood Algorithm SteepestDescent ( I, x (0)) x := x (0) ; Stop := false; While Stop = false do x := x; { x := arg min x N(x) z(x) } For each x N (x) do If z ( x) < z (x ) then x := x; EndFor; If z (x ) z (x) then Stop := true; else x := x ; EndWhile; Return (x, z (x)); The complexity is the product of three terms: 1. the number of iterations t max to reach the local optimum 2. the number of solutions N ( x (t) ) visited at each iteration 3. the time to evaluate the objective γ N ( x (t), E ) In general N ( x (t) ) and γn ( x (t), E ) have a maximum which is independent of x (t).

Evaluating the objective: the additive case The first expedient to accelerate an exchange algorithm is minimizing the time needed to evaluate the objective. If an exchange inserts or deletes a small number of elements, updating z(x) instead of recomputing it costs γ N ( E ) O(1): it is enough to add φ j for each element j inserted in x; to subtract φ j for each element j deleted from x. In the KP and the CMSTP one can define the neighborhood N S1 generated by the exchange of an element i x with an element j E \ x. Moving from x to x = x \{i} {j}, the objective varies by δ(x, i, j) = z(x \{i} {j}) z(x) = φ(j) φ(i) Note that δ(x, i, j) does not depend on x.

Example: the symmetric TSP The neighborhood N R2 for the TSP deletes two non-consecutive edges (s i, s i+1 ) and ( ) s j, s j+1 inserts the two edges ( ) ( ) s i, s j and si+1, s j+1 reverses the direction of ( s i+1,...,s j ) (modifying O(n) edges.) If the cost function is symmetric, the variation of z(x) is δ(x, i, j) = c si,s j + c si+1,s j+1 c si,s i+1 c sj,s j+1 In many other cases, however, the function is not additive.

Quadratic functions In the MDP the objective function is quadratic: if one uses the neighborhood N S1, moving from x to x = x \{i} {j}, the objective varies by δ(x, i, j) = z(x \{i} {j}) z(x) = 1 2 d hk 1 2 h,k x There are O(n) different terms in the two sums. h,k x\{i} {j} There is a general expedient that works with symmetric quadratic objective functions: δ(x, i, j) = 1 d hk 1 d hk 2 2 h x k x h x\{i} {j} k x\{i} {j} δ(x, i, j) = k x d jk k x d ik d ij = D j (x) D i (x) d ij d hk If one knows D l (x) = k x d lk for each l E, the computation requires O(1).

Example: the MDP We want to evaluate the exchange x x = x \{i} {j} with i x and j E \ x. x E \ x z = z D i + D j d ij We loose the pairs including i We get the pairs including j But the pair (i, j) is in excess. i j

Example: the MDP We want to evaluate the exchange x x = x \{i} {j} with i x and j E \ x. x E \ x z = z D i + D j d ij We loose the pairs including i We get the pairs including j But the pair (i, j) is in excess. i j

Example: the MDP We want to evaluate the exchange x x = x \{i} {j} with i x and j E \ x. x E \ x z = z D i + D j d ij i D i j We loose the pairs including i We get the pairs including j But the pair (i, j) is in excess.

Example: the MDP We want to evaluate the exchange x x = x \{i} {j} with i x and j E \ x. x E \ x z = z D i + D j d ij i +D j j We loose the pairs including i We get the pairs including j But the pair (i, j) is in excess.

Example: the MDP We want to evaluate the exchange x x = x \{i} {j} with i x and j E \ x. x E \ x z = z D i + D j d ij i d ij j We loose the pairs including i We get the pairs including j But the pair (i, j) is in excess.

Example: the MDP x E \ x i j d li +d lj l Update of the data-structures: D l = D l d li + d lj, l E Each element l sees d li disappearing d lj appearing

Example: the MDP x E \ x i j d li +d lj l Update of the data-structures: D l = D l d li + d lj, l E Each element l sees d li disappearing d lj appearing

Example: the MDP x E \ x i j d li +d lj l Update of the data-structures: D l = D l d li + d lj, l E Each element l sees d li disappearing d lj appearing

Use of auxiliary information Also other non-linear functions can be updated keeping aggregate information on the current solution; using this information to compute z efficiently; updating such information when moving to the next solution. For the PMSP with the transfer neighborhood N T1 and the exchange neighborhood N S1, one can evaluate the objective in constant time by keeping and updating the completion time of each machine; the index of the machines with the two largest completion time values.

Example: the PMSP Consider the exchange o = (i, j) of jobs i and j (i on machine M i, j on machine M j ) the new completion times can be computed in constant time: one of them increases, the other one decreases (o they remain unchanged); one can check in constant time whether one of them exceeds the maximum completion time; if the maximum completion time decreases, one can check in constant time whether the other one or the second one becomes the maximum. Once visited the whole neighborhood and once selected the move, it is necessary to update the completion times (in constant time: only two of them change);

Use of auxiliary information The auxiliary information can be about the current solution x; the previous solution in the neighborhood, according to a suitable ordering. Consider the neighborhood N R2 for the symmetric TSP: the neighbor solution differ from x by O(n) edges; the solutions in the neighborhood differ from one another by O(n) edges; if the edge pairs (s i, s i+1 ) and (s j, s j+1 ) follow the lexicographical order, the reversed path changes by only one edge. π 0 π i π i+1 π j π j+1 π 0 π 0 π i π i+1 π j π j+1 π j+2 π 0

Example: the asymmetric TSP π 0 π i π i+1 π j π j+1 π 0 π 0 π i π i+1 π j π j+1 π j+2 π 0 In general, the variation of z(x) is δ(x, i, j) = c si,s j + c si+1,s j+1 c si,s i+1 c sj,s j+1 + c sj...s i+1 c si+1...s j When we have considered exchange (i, j) and we consider exchange (i, j ) with j = j + 1 the first four terms change, but they are data; the last two terms can be updated in constant time: { c sj...s i+1 = c sj...s i+1 + c sj+1,s j c si+1...s j = c si+1...s j + c sj,s j+1

Feasibility Some operations done to explore the neighborhood may involve infeasible solutions: { Ñ O (x) = x 2 E : o O : o(x) = x } N O (x) = ÑO(x) X In this case, for each element of ÑO(x) one needs to check the feasibility; if feasible, to evaluate the cost. To check feasibility one can use the same techniques used for the objective.

Example: the CMSTP Consider the neighborhood N S1 that inserts an edge and deletes another: if the two edges are in the same branch, the solution remains feasible; if they belong to different branches, one loses weight, the other one gets it: the variation is equal to the weight of the transferred sub-tree. If we keep the weight of the sub-tree rooted at each vertex, it is enough to compare such weight with the residual capacity of the branch that receives it. This piece of information must be updated once the move has been done: it takes O(n) time.

Refined heuristic The use of additional information implies 1. the inizialization of suitable local data-structures, relates to the exploration of each neighborhood; global data-structures, related to the whole search process; 2. their update from a solution to another or from an iteration to another. Algorithm SteepestDescent ( I, x (0)) x := x (0) ; GD := InitializeGD(); Stop := false; While Stop = false do x := 0; δ := 0; LD := InitializeLD() For each x N (x) do If δ( x) < δ(x ) then x := x; LD := UpdateLD(LD) EndFor; If z (x ) z (x) then Stop := true; else x := x ; GD := UpdateGD(GD) EndIf

Partial conservation of the neighborhood When an operation o O is executed on a solution x, it often happens that the variation δ(x, o) of the objective function does not depend on x or it depends only on a part of x. Many operations o O executed on x = o(x) produce δ(x, o) = δ(x, o) In this case, it is convenient 1. to store all values of δ(x, o) as they are computed; 2. to do the best move, generating x ; 3. to delete the values δ(x, o) δ(x, o); 4. to recompute only the deleted values; 5. to go back to step 2.

Example: the CMST Consider the neighborhood N S1 for the CMST: insert an edge j E \ x delete an edge i x The exchanges of the branches not affected by the move produce the same effect. δ(x, i, j) = δ(x, i, j) Therefore it is possible to keep the set of the feasible exchanges; to delete from the list the exchanges involving one or both the branches associate with the move; to recompute only the effect of those exchanges.

The efficiency-efficacy trade-off The complexity depends on three factors: 1. the number of local search iterations; 2. the size of the neighborhood to be explored; 3. the complexity of evaluating each solution. The former two are conflicting: large neighborhood allows for few steps (or better solutions); small neighborhood implies many steps. The optimal trade-off is somewhere in between: we need a neighborhood large enough to allow to reach good solutions; small enough to allow for a quick selection of the move. In general it is difficult to understand a priori what is the best trade-off.

Fine tuning the neighborhoods It is also possible to fine tune the size of a given neighborhood N: one explores only a promising subset N N For instance, one can insert only elements j E \ x with cost φ(j) low enough delete only elements i x with cost φ(i) high enough if the best known solution is promising, the search terminates For instance, one apply the first-improve strategy: the exploration of the neighborhood is stopped as soon as a solution is found which is better than the current one If z( x) < z(x) then Stop := true;

Fine tuning the neighborhoods The effectiveness depends on the objective: if the cost of some elements heavily affects the objective, it may be worth fixing or forbidding them. It also depends on the neighborhood: if the landscape is smooth, the first improving neighbor solution is not likely to be much worse than the best improving; if the landscape is rugged, the best solution in the neighborhood can be much better than the first improving solution.