A Late Acceptance Hill-Climbing algorithm the winner of the International Optimisation Competition

The University of Nottingham, Nottingham, United Kingdom A Late Acceptance Hill-Climbing algorithm the winner of the International Optimisation Competition Yuri Bykov 16 February 2012 ASAP group research seminar 1

The contents of the presentation The Magic Square (MS) problem. Unconstrained and constrained variants. The International Optimisation Competition and its results. Understanding the MS problem The decomposition of the MS problem. Choosing the best optimization heuristic for the MS problem. 2

Magic Square The task is to put the numbers from 1 to N 2 into a square matrix N N so, that the sums of numbers in all rows, columns and diagonals are equal to the Magic Number: N*(N 2 +1)/2 Magic Squares are known for over 4000 years. In different ancient cultures they had astrological significance. Magic Squares engraved on stone and metal were worn as talismans to prevent diseases. From ancient times they were widely studied by mathematicians. They were widely used as an element of art by painters, sculptures and architects. 3

Unconstrained and constrained Magic Square 1 2 3 4 5 6 7 8 9 A simple (unconstrained) Magic Square can be solved with a variety of quick exact algorithms. The oldest one (Siamese method) is known in Europe since 1693. More complex variant the constrained Magic Square. There are many ways to put constraints on the Magic Square. One of the variants: the solution should contain a given submatrix in a given position. There are no exact polynomial algorithm for this problem. So, we can try heuristic methods to solve it. 4

Magic Square competition The International Optimization Competition was organized in November 2011. The competition was hosted by SolveIT Software Pty Ltd. The purpose of the competition was to promote the modern heuristic optimization methods among the research communities. The task was to develop an algorithm, which is able to solve the largest constrained Magic Square problem within one minute. The algorithm had to be submitted in the form of Java command-line application. Submitted applications were first tested on Magic Squares of sizes 20x20, 100x100 and 200x200 and then with progressively larger number to determine the maximum size solvable within 1 minute. The first place award was 5000 AUD. The details of the competition are available on the web at: http://www.solveitsoftware.com/competition.jsp. 5

The Results of the Competition The competition results became available on 19 December 2011 The 2nd runner-up was Xiaofeng Xie from the Carnegie Mellon University his program solves the constrained version of 400 400 magic square within a minute. The competition entry requirement of 200 200 The 1st runner-up was Geoffrey Chu from the University of Melbourne his program solves the constrained version of 1,000 1,000 magic square within a minute. The winner of the competition was Yuri Bykov from the University of Nottingham his program solves the constrained version of 2,600 2,600 magic square within a minute. 6

While 2600x2600? The competition rules required to compose the MS and write the result into a text file. This has caused some restriction on the largest possible size. The text file containing Magic Square of 2600x2600 has the size of 52 Mb The Java procedure of writing text files is quite slow. It takes 40 seconds just to output the solution, while the actual solving procedure takes less than 20 seconds. The largest MS, which was solved by Java program is 3300x3300. For larger sizes the Java virtual machine reports a memory error. The same program written in Delphi was able to solve the MS of size 7800x7800 in less than one minute. But this limit was also caused by the maximum available PC memory size. On better hardware this algorithm could solve even larger MS within one minute. 60 times larger than the 1 st runner up 7

How to create such an algorithm? The following five steps are necessary: To understand the problem. To found a right problem representation. To formulate a suitable cost function. To select an effective set of moves. To choose a proper optimization heuristic. 8

Understanding the problem We can say the following regarding the MS problem: In reality, the MS problem represents a typical permutation problem. It is different from TSP, Quadratic Assignment, Flowshop Scheduling, etc. by just its cost function. 4 9 2 3 5 7 8 1 6 The cost function is not given explicitly. Therefore we can define and use any cost function, where the global minimum corresponds to the target MS. This is the constraint satisfaction problem (rather than the minimization one). I.e. the goal is the global optimum only. Any local optimum will not satisfy us disregarding of how close it will be to the global one. However, heuristic search methods usually provide near-optimum solutions. To get the global optimum by the heuristic search we have to employ a reheating mechanism. 9

Reheating If the search converges before the global optimum, control parameters are reassigned and a new search attempt is started. The attempts are continued until the global optimum is achieved. Cost N moves With a longer attempt we are more luckily to get a better solution. With a shorter attempt we can run more attempts in the same time. Thus, in order to achieve the global optimum in a minimum time we should find a right balance between the number of attempts and their length. 10

Simplistic approach The simplest variant is to solve the entire problem in a single phase by a one-point heuristic search Initialization: the numbers are put randomly into the square matrix ( the constraint matrix is placed into a given position). Moves: swapping two randomly chosen numbers (the elements of the constraint matrix are not moved). Cost function (C): a sum of the absolute deviations of the sums of rows, columns and diagonals from the Magic Number. Acceptance condition: can be HC, SA, or LAHC. Reheating s s no Start Initialization s s 0 Generate a candidate solution s Calculate the candidate cost function C Apply acceptance condition Accept s s' Convergence C=0 yes Stop yes yes no Unfortunately this approach is not powerful enough (max N=50-70 in one minute ) 11

The golden rule of heuristic optimization If you can decompose the problem do that! This is the most effective way to increase the quality of results. 12

A decomposition approach The Magic Square problem can be decomposed into a number of Border Square or Magic Frame problems The Magic Frame (MF) is a square matrix N N where only border rows and columns are non-zero. The sum of numbers in the border rows and columns is equal to the Magic Number. The sum of numbers in other rows, columns and diagonals is equal to N*N+1. 20 2 3 18 22 1 25 19 7 21 5 4 24 23 8 6 The MF contains 2*(N-1) numbers A i N*N/2 and the same number of their counterparts B i =N*N+1-A i placed symmetrically. The MF can be composed from any given set of numbers, which satisfy the above conditions. The MF can contain constraints (e.g. specified for the complete Magic Square). 13

The main property of the Magic Frames If we take a Magic Frame of N N and place inside it a Magic Square of (N-1) (N-1) composed from the remaining numbers, then we get a Magic Square of N N Correspondingly: The MS can be composed by placing the MFs one into other. A smaller MS can be placed inside the set of MFs. If the constraint matrix is placed not far from the border, we can construct several constrained MFs and then fill the center by a quick exact method. If the constraint matrix lies more deeper inside, we can construct several unconstrained MFs and then use the simplistic approach to insert a small constrained MS in the centre. 14

The moving of elements within the Magic Square Having the deeply placed constraints we can also use the following method: 1 55 39 17 39 17 1 55 If we take four points A, B, C, and D so that they represent the vertices of a rectangle, do not belong to diagonals and A+B=C+D, then we can swap A with C and B with D without disturbing the magic properties of the square. By this method we can assign constraints close to the border, compose the Magic Square and then move the constraints into a necessary position. If some of these points belong to diagonals, the same procedure is repeated to repair the Magic Square. 15

Heuristic search for the Magic Frame problem It is different from the simplistic approach by the following points: At the initialization step the algorithm selects the necessary numbers A i and put them randomly into the Magic Frame. Constraints are put into the necessary places. Counterparts B i are put symmetrically to A i. To calculate the cost function we first calculate the sums of the numbers in the first row and the first column. Then the cost is calculated as a sum of absolute deviations of these sums from the Magic Number. We do not need to count the last row and the last column as they give the same cost. 1 2 3 18 22 6 20 19 7 21 5 4 24 23 8 25 =51 Magic Number for 5x5 Magic Square = 65 Cost= 65-51 + 65-46 =23 =46 16

Heuristic search for the MF problem (cont) Solutions are modified by two types of moves: 1 st type: a randomly chosen number is swapped with its counterpart. 2 nd type: the algorithm randomly chooses two numbers and then swaps them and their counterparts. The type of the move is selected randomly. Constraints are not moved. The both types keep the feasibility of the MF, i.e. A i +B i =N*N+1 throughout the search procedure. 1 2 3 18 22 6 20 19 7 21 5 4 24 23 8 25 1 2 3 18 22 6 20 19 7 21 5 4 24 23 8 25 17

The scheme of the entry algorithm The entry algorithm has to solve constrained and unconstrained MS problems of any size Start yes Constrained problem no yes Problem size < 20 yes no Constraints near the board no Solve one outer MF yes Evenly even problem no Apply the simplistic approach to MS Solve a number of outer MFs Solve a number of outer MFs Replace constraints Apply constructive method Fill the center with constructive method Stop 18

Choosing an optimization heuristic There were tested 3 candidate heuristics: Greedy Hill-Climbing (HC), Simulated Annealing (SA), and the Late Acceptance Hill-Climbing (LAHC). A suitable heuristic should provide a stable result within 1 minute on MS problems with a high variation of size: from 25x25 to 7000x7000. The exceeding of 1 minute is regarded as a fail. The algorithm should run in a fully-automated mode. So, the algorithmic parameters should be independent on the size of the problem. The parameterization is complicated because we have a search with reheating. All tests are run on the most complex constrained MS: the constraint matrix is placed into position (4,4). Here the algorithm consequently solves 7 outer MFs. 19

Late Acceptance Hill-Climbing (LAHC) This is the new optimization heuristic invented in 2008. All improving candidates are accepted. A worse candidate is accepted if it is better than the solution, which was current several iterations ago. Previous current costs are stored in a list (fitness array) of length L fa. The candidate cost is compared with the last element of the list. L fa is a singe input parameter of this algorithm. During the reheating all elements of the fitness array are increased on 10% of their initial values. Hence, we have to set up: Just the length of the fitness array L fa. 20

The parameterization of the LAHC This is quite straightforward: run the algorithm a number of times with different L fa, record the processing time and choose the L fa, which gives the shortest time. Time 60 Size=25x25 Time 60 Size=1000x1000 50 50 40 40 30 30 20 20 10 10 0 0 10000 20000 30000 40000 Lfa 0 0 10000 20000 30000 40000 Lfa With relatively small problems the LAHC works well with any value of L fa (except of too small values). This demonstrates that the HC (LAHC with L fa =1) is not suitable for MS problem. 21

The parameterization of the LAHC (cont.) The tests continued with largest problem (7000x7000). With 7000x7000 problem in the majority of cases the LAHC produced a result in less than 30 sec. Only one run (over 3669 ones) lasted over 60 sec (the reliability of the method = 99.973%). Experiments did not reveal a clear optimum for L fa. The L fa of 10000-20000 looks to be good enough for all problem sizes. Time 60 50 40 30 20 10 0 Size=7000x7000 0 10000 20000 30000 40000 Lfa 22

Simulated Annealing The second (after Genetic Algorithms) most studied method above all metaheuristics. All improving candidates are accepted. Worse candidates are accepted with probability P=exp[(C-C )/T j ], where T j is a temperature on the j th iteration. In logarithmic cooling schedule the temperature for the next (j+1) iteration can be calculated as T j+1 =T j /(1+λT j ). Cooling factor λ can be calculated as λ=(t i -T f )/(N tot *T i *T f ), where T i is the initial temperature, T f is the final temperature and N tot is the total number of iterations, required for passing from T i to T f. During the reheating the temperature is increased on 10% of its initial value. Thus, before running SA, we have to set up: Initial temperature T i Final temperature T f Total number of iterations N tot 23

The initial and final temperatures of SA The literature suggests: The initial temperature should be set so that, around 85% of non-improving moves will be accepted. However, tests have shown that the optimal initial temperature is highly varied across the problems of different sizes. We have two hypothetical options of how to set the optimum values of T i for each MS size: To develop a special algorithm for adjusting T i. To derive an analytical formula for T i using the regression. Both options require significant extra efforts. Moreover, it is unknown how beneficial this suggestions for MS problem. So, these experiments were postponed.? MS size Optimal T i 25 2200 200 210000 1000 5.8*10 6 2600 39*10 6 7000 290*10 6 The final temperature should guarantee the convergence of the search. The convergence temperature is also highly different across the runs and the MS sizes. During tests it was not lower than 0.5. So, this value was assumed for T f. 24

Testing the total number of iterations for SA Having T i and T f the algorithm was run a number of times while varying N tot. Time Size=25x25 Time Size=200x200 60 60 50 50 40 40 30 30 20 20 10 10 0 0 20 40 60 80 Ntot (Millions) 0 0 20 40 60 80 Ntot (Millions) With small sized problems SA works well (similar to the LAHC). The optimal value of N tot is not completely clear from these diagrams. 25

Problems of the larger size When the run time exceeded 60 sec the search was stopped (failure was recorded). Time Size=1000x1000 Time Size=2600x2600 60 60 50 50 40 40 30 30 20 20 10 10 0 0 20 40 60 80 Ntot (Millions) 0 0 20 40 60 80 Ntot (Millions) With the 1000x1000 problem the search fails in 5% of cases. With 2600x2600 problem only 38% of runs were successful. 26

The largest problem With 7000x7000 problem SA fails in 93% of cases. Te reliability is only 7%. These experiments did not reveal any beneficial value of N tot for the large-sized problems. Time 60 50 40 30 20 10 Size=7000x7000 0 0 20 40 60 80 Ntot (Millions) A hypothesis: maybe T i taken from the literature is not suitable here? To check that it is worth to test T i in the same way.? 27

Testing the initial temperature N tot was set to 20 millions and the SA was run while varying T i. Time 60 Size=2600x2600 Time 60 Size=7000x7000 50 50 40 40 30 30 20 20 10 10 0 0 0 10 20 30 40 0 5 10 15 Ti (Millions) With the 2600x2600 problem the search fails in 62% of cases With 7000x7000 problem only 7% of runs were successful Ti (Millions) The experiments again did not reveal the optimal value of T i 28

The last attempt to tune SA Finally the SA was run with varying both T i and N tot. The last attempt still did not reveal any region of T i & N tot where the SA does not fail. Ntot (Millions) 80 Size=7000x7000 Now the SA has failed in 83% of runs. The distribution of the failing is uniform over the experimental space. Although the parameterization of SA took a lot of time and efforts, it did not yield a positive result. It could be concluded that SA performs very poorly with large MS problems regardless of the parameters setting. SA 60 40 20 0 0 100 200 Ti (Millions) - T 60 sec - T > 60 sec 29

Final comparison of the LAHC with the SA Several qualifiers were taken into account: Qualifier LAHC SA Implementation Simple Simple Productivity 8*10 6 iterations per second 4.9*10 6 iterations per second (because of the time-expensive exponent calculation) Number of parameters 1 3 Parameterization Easy Hard Reliability with small MS problems Reliability with large MS problems Suitable for the competition entry High High Yes High Low No 30

Conclusions There already known an example of non-linearly rescaled Exam Timetabling problem, where the SA fails but the LAHC works well (see: http://www.cs.nott.ac.uk/~yxb/lahc/). The Magic Square problem is the second example, where the LAHC clearly outperforms the SA. It could be proposed that the LAHC is more suitable than SA to very large optimization problems. This is not surprising, because the LAHC holds the unique property: it combines the power of one-point searches and reliability of ranking-based methods. You can check everything by yourself: the original Java code is available on the above web page. 31

Acknowledgement This was my first program, written in Java and I would like to thank Matthew Hyde, who has helped me with my first steps in Java. 32

Any questions? 33