XII International PhD Workshop OWD 2010, 2326 October 2010 Parallel implementation of Evolutionary Algorithm for QAP Anna Obr czka, AGH University of Science and Technology Abstract In this paper the implementation and tests of parellel Evolutionary Algorithm (EA) was presented. The algorithm was implemented as a island model of EA and the OpenMP was used to obtain parallelization. There was also used an interesting model of migration from [5], which was rst time customized to Quadratic Assignment Problem (QAP). Keywords: EA, QAP, parallelization, Open- MP 1. Quadratic Assignment Problem Quadratic Assignment Problem (QAP) was formulated in 1957 by Koopsman and Beckman. QAP is optimalization problem and it belongs to NP-hard problems class. Moreover, nding approximate solution of QAP is NP-hard too. QAP can be present as a mathematical model. Lets consider a set N = {1, 2,..., n} and two n n matrices: D = [d i,k ], F = [f j,l ]. As Π the set of permutations of N can be signied. Now the quadratic assignment problem with coecients matrices D and F can be formulated as follows [2]: min π Π n i=1 j=1 n d π(i),π(j) f i,j. (1) According to this, QAP is a problem of nding this permutation π Π, which minimizes the double sum in (1). It can be dened a function φ(π), which is called objective function [2]: n n φ(π) = d π(i),π(j) f i,j. (2) i=1 j=1 The optimal solution of QAP is permutation π 0, minimizing function (2), and the corresponding objective function value φ(π 0 ) is called an optimal value of QAP. Usually, elements of matrix D are called distances between locations, elements of F are called ows between facilities and π(i) means number of facility assigns to location i. Solution methods for QAP can be divide into two groups: exact algorithms and heuristics. Three main exact techniques are: dynamic programming, cutting planes method and branch and bound procedure and the most eective is the last. This method has been improved since 1962, when Gilmore solved the QAP of size n = 8. In 2000 the solution of nug30 (n = 30) from QAPLIB [3] was nd by Anstreicher. QAP is said to be one of the hardest combinatorial optimalization and operations research problem [4]. Because of their high computational complexity, currently size of QAP solved by exact methods is no more then n = 30. For example, if n = 8, set Π contain n! = 8! = 40320 permutations, if n = 30 amount of permutation is 2.65252859812 10 32. This is the reason to apply heuristic algorithms for greater instances of QAP. A heuristic is an algorithm used to nd a solution quickly, but without quarantee to be correct or, in optimalization, to be optimal. Heuristics are used when the exact algorithms are unknown or they aren't suciently eective. It can be mentioned: local search, constructive methods, simulated annealing, tabu search, greedy randomized adaptive search pro- 127
cedure, ant colony algorithms and genetic algorithms. There are many practical applications of the QAP like architecture planninig, analysis of economic dependencies, electrical wiring problems especially VLSI, assign machines to jobs [1], ranking of archeological data, ranking of a team in a relay race or scheduling parallel production lines [4]. 2. Evolutionary Algorithms Evolutionary Algorithms (EA) are the heuristic algorithms inspired by Theory of Evolution. They operate on set of solutions and, using selection and genetic operators, create new set of better solutions and replace the old ones. The set of solutions we call population and single solution individual. To evaluate a quality of solution we dene the objective function. The object of EA is optimalization value of objective function. First trials, solving QAP by EA, were the hybrid realizations. They mixed classic EA with other heuristics like tabu search or simulated annealing. In 2000 Ahuja, Orlin and Tiwari propsed greedy genetic algorithm, which coped very well with big instances of QAP. In 2003 Drezner obtained even better results, applying his EA mixed with tabu search and special crossover operator. This new genetic algorithm is currently one of the best heuristics to solve QAPs [4]. Very promising in the led of EA are parallel realizations, because parallelism can speed up computations. In 1981 Grefenstette already considered rst teoretical ideas of parallel EA. Now, the most popular and the most often applied is the island model, which is implemented in this work too. The island model divides the whole population into smaller groups subpopulations called islands. There is two new elements in island model: migration and topology. In each island the basic EA work independently and migartion and topology decide about way of communication between them. This communication means exchange of individuals or sometimes informations about parameters of EA. Depend on migration strategy, individuals to exchange are choosen randomly, with any kind of selection or they are the best of current subpopulation. Emigrated individuals can by copy or sent to other island. Similar, immigarants can be added or can repleace some of individuals (for example the worsts). We should also determine values of migration rate and migration size, which means, appropriately, frequency and number of individuals to exchange. Topology denes islands structure and the way of connection between them. Practically, there is no restriction, so we can design whatever we want. We also have no limitations with size and number of subpopulations. It can be equal or dierent number of individuals in each subpopulation and the number of subpopulations can be large or small. This freedom gives a possibility of produce dierent realizations of island model algorithm. 3. Implementation description In this work the island model of EA for QAP was implemented. Because of the QAP is a permutation problem, some elements of algorithm must be adapt to it. It was choosen the permutation encoding for individuals instead of binary. This allows to reduce representation of solution 4 times, for the same size of instance [1]. Besides it is more intuitive. Fig. 1 shows the representation of example solution for size n = 10. Facilities: 10 2 5 9 8 7 3 1 6 4 Locations: 1 2 3 4 5 6 7 8 9 10 Fig. 1. Representation of example solution in QAP. 3.1. Topology Topology is based on the super star-shape model [5]. Four star-shaped structure subpopulations create a universe U. Universes are connected through their central subpopulations. This topology, for three universes, is shown on 128
g. 2. The arrows means possible directions of migration. Parent1: 10 2 5 9 8 7 3 1 6 4 Child1: 2 5 10 9 8 7 3 1 6 4 Parent2: 3 5 10 1 8 9 2 6 4 7 Child2: 10 3 5 1 8 9 2 6 4 7 Fig. 3. Example of PMX operator. For example from g. 4, if there was choosen positions: 2, 3, 5, 9, 10, the mapping is: 2 5, 5 10, 8 8, 6 4, 4 7. Fig. 2. Super star-shape topology. Parent1: 10 2 5 9 8 7 3 1 6 4 Child1: 3 2 5 1 8 9 10 7 6 4 Parent2: 3 5 10 1 8 9 2 6 4 7 Child2: 2 5 10 9 8 6 3 1 4 7 3.2. Operators To obtain a child population the genetic operators, like crossover and mutation, was used. For permutation problems, usually the specic operators was used, which can preserved permutational character of solutions. In my algorithm the two crossover operators: PMX and UX was implemented and one mutation operator: EM. For each of them there was dened a probability of appear, mark appropriately: p P MX, p UX and p EM. They must fulll condition: p P MX + p UX + p EM = 1 (3) The PMX (partially-mapped crossover) operator can be describe in 3 steps: 1. Choose the subsequence in Parent1 and copy it to Child1. 2. From Parent2 copy this position to Child2, which are not yet in it. 3. Rest of position ll basing on mapping. Mapping is created after choose the subsequence. It makes a pairs of values occuring in the same position in subsequence in both parent. For example from g. 3 the mapping is: 9 1, 8 8, 7 9, 3 2, 1 6, 6 4. The UX (uniform crossover) operator is similar to PMX. Main dierence between them is in the rst step couple positions instead of subsequence was choosen. Rest steps is like in PMX. Fig. 4. Example of UX operator. The EM (exchange mutation) operator is very simple. There was choosen two positions and then the values between this positions was exchange. In g. 5 is shown example of this operator. The positions to exchange are: 4 and 7. Parent: 10 2 5 9 8 7 3 1 6 4 3.3. Migration Fig. 5. Example of EM operator. Child: 10 2 5 3 8 7 9 1 6 4 Implemented migration strategy was drawn from [5], but in this work they are rst time adapt to QAP. The main parameter is value θ called threshold, and: 0 θ max{ fit i fit j }, (4) where fit k means the best individual's tness value in subpopulation k. We dene α: { 1, for f it > θ, α = (5) 0, for f it θ, where fit = fit i fit j. In each generation, for two subpopulations: i and j, for which migration is possible, there was evaluated a value of α and then there was completed this steps: 129
1. On the basis of fit it can be described the direction of migration. If fit < 0 migration succeed from subpopulation i to j, if fit > 0 migration has the oposite direction, and if fit = 0 there is no migration. 2. If α = 1, S individuals were choosen to migrate, with tournament selection method, else there is no migration. 3. From target subpopulation S worst individuals were removed and the imigrants can be received. Value S is equal 0.125 subpopulation size. Additionaly there was distinguished two kind of migration: local and global. Local migration is the migration between subpopulations in one universum and it is evaluated in each iteration. Global migration is the migration between universes, which mean migration between central subpopulations and it is evaluated in every 10 iteration. Values of thresholds may be dierent for local and global migration and it was marked: θ L and θ G respectively. 3.4. Parallelism In implemented algorithm an OpenMP was used to obtain a real parallelism. OpenMP (Open Multi-Processing) is an Application Program Interface (API) for C, C++ and Fortran that may be used to explicitly direct multithreaded, shared memory parallelism [6]. Open- MP can be used in any multi-processor, shared memory envirnment, both Unix/Linux and Microsoft Windows platforms. It provides a simple and intuitive pre-processor directives. They de- ne how to divide the code between processors, which parts should be make sequentially and which parellel and the rules of data sharing. Using OpenMP the code was divided to parallel section: the basic island EA with local migration on each universum, and short sequential section: global migartion between universes. It is clearly ilustrated on g. 6, which contain the pseudo code for implemented algorithm. 4. Experiments The algorithm was implemented in C++, compiled and ran on computer with dual core processor: Intel Core 2 Duo T7500 2.20 GHz, with begin Init t = 0 while not STOPCondition do foreach U do parallel begin BasicEA(t) LocalMigration(t) end for if (t%10 = 0) GlobalMigration(t) end if t = t + 1 end while end return BestSolution Fig. 6. Pseudo code for implemented algorithm Linux Gentoo system based on 2.6.33 kernel and the compiler was gcc 4.4.3. All tests was made on the instances from QAPLIB [3]. QAPLIB is the library of QAP instances. It can be found at: http://www.seas.upenn.edu/qaplib/. It contain more then 100 tests examples, their authors and best known solutions. Moreover, it is good source of knowledge about current research of QAP and contacts to people, who do this research. 4.1. Migration Some experiments were designed to choose good values for migration parameters. Migration parameters are: local (θ L ) and global (θ G ) thresholds and number of universes (U). For each instance and each combination of parameters the algorithm was ran 10 times and there was compared average and best obtaining solution F EA. First there was tested local and global migration thresholds θ L and θ G. There must fulll condition 4, which means there can't be greater than the greatest dierence between best subpopulations solutions. However, for various instances, objective function was eveluated from various matrixes D and F. This fact makes big divergence in obtained values of this 130
function. To compare values of θ L a new parameter ψ describing an instance, was dened: ψ = max{d i,j } max{f i,j } (6) The threshold θ G was set to be equal to ψ. Number of universes U was set to 4. We tested 32 instances for θ L equal 10%, 50%, 100% and 200% of ψ. We obtained the best solutions in the shortest time for θ L = 10%ψ. Also for θ L = 1/2ψ algorithm found good solutions, it means that frequent migration improve computations. Second experiments was similar to previous one, because the inuece of global migration frequent was tested. The same parameter ψ de- ned at 6 was used too. The local migration threshold was set to 1/2ψ and number of universes U was still 4. The same 32 instances was tested for θ G equal 10%, 50%, 100% and 200% of ψ. Result of this test shown that the best solutions was obtained for θ G = 1/2ψ. It is compromise between fast convergence and diversity among universes. Last tested thing in migration was number of universes. The θ G was set to ψ and θ L = 1/2ψ. The value of U was changed - rst it was 2, then 3 and 4. Conclusion from this test was very simple - algorithm found better solutions for greater value of U. More details about tests are in [7]. 4.2. Parallelism As was described in sec. 3 the OpenMP was used to parallelize code. Parallel and sequential version of the algorithm was tested to calculate obtained acceleration. Experiment was ran in environment described at the top of this section. Tested instace was nug25. In tab. 1 are the times of computation for sequential and parallel version of code. There are sum of 20 runs T 20 and the average time of one run T av. The acceleration can be evaluated according to equation: S p = T 1 T p, (7) where p processors number, T 1 time of run sequential algorithm, T p time of run parallel Tab. 1. Comparision of sequential and parallel version of algorithm. version T 20 [s] T av [s] sequential 792.47 39.62 parallel 475.35 23.77 version of algorithm on p processors. Obtained acceleration S 2 was: on dual-core processor. S 2 = 792.47 = 1.67 (8) 475.35 The processor load was also checked. Fig. 7 shows cores load while the sequential version was running, and g. 8 cores load while parallel version is runinig. It is simple to see that Fig. 7. Processor load while sequential version is running. Fig. 8. Processor load while parallel version is running. parallel version of code use both core in the same time and it end quicker than sequential version. 4.3. General evaluation In tab. 2 are results of our experiments. F best is the best solution from QAPLIB, F EA is the best solution obtained during three described previous tests and e is the percentage deviations over the F best : e = F EA F best F best 100%. (9) 131
Tab. 2. Comparision between best known solutions from QAPLIB and our best obtained solutions. name F best F EA e [%] esc32a 130 138 6.15 esc32b 168 168 0 esc32c 642 642 0 esc32d 200 200 0 esc32e 2 2 0 esc32g 6 6 0 esc32h 438 440 0.46 esc64a 116 116 0 esc128 64 68 6.25 kra30a 88900 90090 1.34 kra30b 91420 92840 1.55 kra32 88700 91620 3.29 lipa20a 3683 3722 1.06 lipa30a 13178 13412 1.78 lipa40a 31538 31968 1.36 lipa50a 62093 62921 1.33 nug25 3744 3744 0 nug27 5234 5264 0.57 nug28 5166 5202 0.7 nug30 6124 6158 0.56 rou12 235528 235528 0 rou15 354210 361318 2.01 rou20 725522 726920 0.19 scr12 31410 31410 0 scr15 51140 51140 0 scr20 110030 110968 0.85 tai25a 1167256 1194332 2.32 tai30a 1818146 1875472 3.15 tai40a 3139370 3268882 4.13 tai50a 4938796 5129828 3.87 tai60a 7205962 7530808 4.51 tai80a 13511780 14060892 4.06 5. Conclusions The island model of EA was implemented and tested, with migration strategy from [5]. This was rst implementation of this migration model for QAP. The OpenMP was used to parallelize code. The main tested element was migration. The experiments was carried out to check inuence of migration parameters for algorithm. For universes number, it was obtained simple conclusion: algorithm nds better solutions for greater value of U. For thresholds (θ L and θ G ) it is hard to determine one, universal good value, their should be adjust individualy to instances. Proposed strategy: set less value of local migration threshold to provide good information ow between subpopulations, and higher threshold for global migration to preserve diversity among universes. Also parallelism was succesfull tested. Obtained acceleration was equal to 1.67 on dualcore processor. In more parallel environment it will be possible to improve this results, because algorithm can be used with more universes. Algorithm found solutions, which are very close to best known solutions from QAPLIB. Acknowledgment Work nanced by state science funds for 2008-2011 as a research project. Contract no. N N514 414034. Bibliography [1] Chmiel Wojciech: Algorytmy ewolucyjne w optymalizacji przydziaªu zada«z kwadratow funkcj celu, Rozprawa doktorska, AGH Kraków, Wydziaª EAIiE. Promotor: K. Wala [2] Çela Eranda: The quadratic assignment problem: theory and algorithms,kluwer Academic Publishers, 1998 [3] QAPLIB: http://www.seas.upenn.edu/qaplib/ [4] Ji P., Wu Y., Liu H.: A Solution Method for the Quadratic Assignment Problem (QAP), ISO- RA'06, pp. 106-117, 2006 [5] Gu J., Gu X., Gu M.: A novel parallel quantum genetic algorithm for stochastic job shop scheduling, Journal of Mathematical Analysis and Applications, vol. 355, pp. 63-81, 2009 [6] OpenMP: https://computing.llnl.gov/tutorials/openmp/ [7] Obr czka Anna: Równolegªe realizacje algorytmów ewolucyjnych., Praca magisterska, AGH Kraków, Wydziaª EAIiE. Promotor: W. Chmiel Author: mgr in». Anna Obr czka Akademia Górniczo-Hutnicza Katedra Automatyki Al. Mickiewicza 30/B1 30-059 Kraków e-mail: anna.obraczka@gmail.com 132