Algorithm Design (4) Metaheuristics

Algorithm Design (4) Metaheuristics Takashi Chikayama School of Engineering The University of Tokyo

Formalization of Constraint Optimization Minimize (or maximize) the objective function f(x 0,, x n ) with values <x 1,, x n >, x k D k that satisfy a condition C(x 1,, x n ) An objective function to be minimized is also called a cost function A value set <x 1,, x n > that satisfies the constraint but may not give the minimum (or maximum) the objective function is called a feasible solution

Algorithms for Combinatorial Optimization Strict algorithms Strictly the best solution is found i.e., No other feasible solution is better Often requires large computational cost Approximate algorithms Find a solution hopefully close to the best i.e., Not necessarily the real best Often decreases the computational cost

Iterative Improvement Methods 1. Find an initial feasible solution, which satisfies the constraint but may be far from optimal 2. The solution is modified a bit without violating the constraints, making the next feasible solution (neighbor solution) 3. Repeat the process until some termination condition is reached Small modifications are expected to lead to better feasible solutions

Simple Iterative Improvement In the step 2 of the previous page, always choose the best among the neighbor solutions Simple and efficient Several names Local Search Greedy Search Hill Climbing

Local Search 1. If there exists a better feasible solution in the neighborhood of the current solution, make that current 2. Repeat this until there is no better solutions in the neighborhood Neighborhood: A set of feasible solutions that can be easily derived from the current solution Usually, only some of the variables comprising the solution are modified Broad neighborhood means high cost in each step Should be able to cover all the feasible solutions

Convergence to Local Optima Local search may result in a locally optimal solution which is far from the global optimum initial solution cost one of local optima The global optimum solution space

Repeated Local Search Repeat local search from randomly chosen multiple initial solutions cost initial solution one of local optima another initial solution The global optimum solution space

Repeated Local Search Quite simple and efficient High parallelism: a parameter survey Parallel trials with different parameters No communication nor synchronization except for data distribution and solution gathering Depends on characteristics of the search space and distribution of initial solutions Can initial solutions be placed close to optimum No reason to use more complicated metaheuristics if repeated local search is enough

Metaheuristics Heuristics Methods likely to lead to solutions No guarantee, however, to find a solution Usually specific to problem areas Metaheuristics Heuristics independent on problem areas The same formulation can be applied widely Search in a space with some appropriate neighborhood notion is considered here

Simulated Annealing (SA) Local search always take the best neighbor, which often leads to local optima Allowing a little worse solution may help Annealing (metallurgy) Heating and then slowly cooling increases crystal size and releases defects Giving higher energy makes the state jump out of a locally lowest energy state, to be able to go toward lower energy states

Annealing Balls on a tray are flattened by shaking up potential energy

Notions in Algorithms and Physical Counterparts Algorithm Cost Feasible solutions Optimum Local search Annealing Metallurgy Physics Energy level Physical states Ground state Quenching Annealing = Slow cooling

Local Search vs. Simulated Annealing initial solution cost local optimum global optimum solution space

Choice of Next Solution in SA A randomly chosen neighborhood solution is chosen if it is not too bad 1. In the neighborhood of the current solution X, randomly choose one solution Y 2. With a random number r in the range [0,1], and some constant T (temperature), Y is accepted if the cost improvement satisfies r d e / T 3. If not, go back to step 1 and repeat

Temperature and Acceptance Probability Acceptance Probability 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-10 -9-8 -7-6 -5-4 -3-2 -1 0 Cost Improvement Temp. 10 9 8 7 6 5 4 3 2 1

Temperature Scheduling Temperature value is critical Low temperature does not allow the solution escape from local optima High temperature may make an already good solution much worse Allow relatively large worsening in the beginning and gradually decrease the allowance Temperature scheduling High temp cool down low temp

Difficulty of Temp. Scheduling Cooling down too rapidly may make the search trapped in a local optimum Cooling down slowly takes more computation time With large enough constant c and setting the temperature of the n-th step T n to c/log n, the algorithm is guaranteed to converge to the real optimum solution Unfortunately, convergence is too slow for practical use

Conventional Scheduling A frequently used scheme is to decrease the temperature by a constant ratio α (0 < α < 1) T αt; There is no good general scheme to decide the value of α Usually, a constant close to 1 (0.999, for example) is used for α

Temperature-Parallel SA Parallel annealing with different temp. Decent solutions with high temp. are swapped with not-so-good solutions with lower temp. No temperature scheduling needed High Temp. Low Temp.

Tabu Search If local search ended up at is a local optimum X, we may select some feasible solution in the neighborhood of X as the next candidate Simply doing this will lead to X again, resulting in an infinite loop Make recently visited candidates taboo Keep already visited candidates in a list and exclude them from the candidates The list may grow too long

Simple Taboo List may Become Too Large Putting all feasible solutions in into the list is required to escape the local optimum Initial solution cost Local optimum Global optimum Space of feasible solutions

More Efficient Taboo Condition Settings diff(x, Y): Changes made to move from a feasible solution X to another one Y E.g., Which variables have different values For a certain period after a move X Y, diff(y, X) will be kept as a taboo change in the taboo list Much smaller than solutions themselves With the taboo period of L steps, the search will never have a loop shorter than 2L steps

Difficulty in Setting the Taboo Period A period too short makes the search likely to loop around local optima A period too long will make The cost of taboo checking larger, and Non-taboo moves within the neighborhood fewer; In the worst case, all moves within the neighborhood may become taboos

Design of Neighborhood is Essential Both simulated annealing and tabu search can escape from small local optima, but larger local optima are hard to escape from It is essential to design neighborhoods so that smooth transition of neighborhoods will lead to the global optimum With a good neighborhood design, simple local search may also lead to a good solution

Good Neighborhood Design

Different Choices of Neighborhoods for TSP 2-opt 3-opt Or-opt

Multiple Searches in Parallel All of local search, annealing, and tabu search try to gradually improve a single solution Parallelization is available by improving multiple feasible solutions in parallel, but, information on the solutions visited in the improvement process is not utilized Group Optimization

Particle Swarm Optimization Particles move around in a multi-dimensional space, as insects swarm around food Positions in the space have fitness values Particles can exchange information

Particle Swarm Optimization Assumption: Solutions within the neighborhood of a good solution are likely to be good also A particle has acceleration given by some linear combination of the following Direction of the best solution found so far by the particle itself Direction of the best solution found in the neighborhood of the particle Direction of the best ever found globally Some randomness

Utilizing Parts of Solutions When solutions can be decomposed into parts A feasible solution may have good parts and not-so-good parts Updating a feasible solution may destroy the good parts Combining good parts of multiple feasible solution may be enabled if search is conducted for a group of solutions Genetic Algorithms

Genetic Algorithms An algorithm mimicking evolution 1. Start with a group (population) of a number of initial solutions (individuals) 2. Offspring made through some alterations Decompose individuals and combine the parts again to make new individuals: Mating Some random alterations: Mutation 3. Pick up better individuals to form the next generation: Selection 4. Loop back to 2 until an appropriate solution is obtained

Gene, Mating, and Mutation A gene as a list of variables Mating: Crossover With two genes X = { x 1, x 2,, x n } Y = { y 1, y 2,, y n } Define a cross point k (1< k n) at random and make crossed genes Z = { x 1, x 2,, x k, y k+1, y k+2,, y n } W = { y 1, y 2,, y k, x k+1, x k+2,, x n } Mutation: Random changes of values

Selection In principle, individuals with higher fitness (those with lower costs) are chosen Strict application of the principle will damage the gene diversity, that may impede later improvements Not-so-good individuals may have good parts in their genes Through mating with other individuals, the good parts may become apparent Introduce some randomness to the selection

Points in Gene Design Results of mating and mutation should represent feasible solutions frequently Inefficient if only few meet constraints; Even extinction may result Relaxing the constraint and reflecting its violation to the cost may be useful Crossover should preserve meaningful parts of genes Variables closely related should have close locations on genes

Genes with Explicit Structures Genes can have structures other than lists Obtaining feasible solutions through mating and mutation could be made easier Tree-structure: Explicit correspondence of parts of solutions and the gene structure

Island Model GA Population is divided into groups GA is applied to each group individually Good individuals are exchanged occasionally Evolution in an area consisting of islands Each group would develop distinctive sets of genes Good parallelism with small communication Reported to be efficient even in sequential environments

Genetic Programming Automatic programming through GA Programs as tree-structured genes Nodes: Primitive operations Leaves: Constants and variables Fitness: How close to the specification The same algorithm as GA Good for problems that specification fitness can be quantitatively stated e.g. Find an expr. that explains data seq.

There Ain't No Such Thing As A Free Lunch It is impossible to get something for nothing 19 th century tradition of saloons in US to provide free lunch to patrons who had purchased at least one drink The food were salty and thus the customers usually ended up paying for a lot of beer When we obtain something free, that is actually at the expense of something else

No-Free-Lunch Theorem Wolpert and Macready, 1995 When objective functions are drawn uniformly at random, all algorithms have identical mean performance Algorithms that perform better for some kinds of objective functions must perform worse for some other kinds Algorithms that fit the characteristics of the objective functions should be chosen

Metaheuristics Heuristics not specific to problem domains Can be applied to a variety of problems Heuristics is heuristics No guarantee to find a good solution Whether the formulation fits the problem is the question Complicated algorithms have higher computational costs Simple repeated local search may work better