Design and Analysis of Optimization Algorithms Using Computational

Appl. Num. Anal. Comp. Math., No. 3, 43 433 (4) / DOI./anac.47 Design an Analysis of Optimization Algorithms Using Computational Statistics T. Bartz Beielstein, K.E. Parsopoulos,3, an M.N. Vrahatis,3 Department of Computer Science XI, University of Dortmun, D 44 Dortmun, Germany Computational Intelligence Laboratory, Department of Mathematics, University of Patras, GR 6 Patras, Greece 3 University of Patras Artificial Intelligence Research Center (UPAIRC) Receive 3 October 4, revise December 4, accepte December 4 Publishe online December 4 We propose a highly flexible sequential methoology for the experimental analysis of optimization algorithms. The propose technique employs computational statistic methos to investigate the interactions among optimization problems, algorithms, an environments. The workings of the propose technique are illustrate on the parameterization an comparison of both a population base an a irect search algorithm, on a well known benchmark problem, as well as on a simplifie moel of a real worl problem. Experimental results are reporte an conclusions are erive. Introuction Moern search heuristics have prove to be very useful for solving complex real worl optimization problems that cannot be tackle through classical optimization techniques []. Many of these search heuristics involve a set of parameters that affect their convergence properties. Usually, an optimal parameter setting epens on the problem, as well as on restrictions pose by the environment, such as time an harware constraints. We propose an approach for etermining the parameters of optimization algorithms, tailore for the optimization problem at han (we consier only minimization cases, although the technique can be straightforwarly applie to maximization problems). The propose approach employs techniques from computational statistics an statistical experimental esign. Its workings are illustrate for both a population base an a irect search algorithm, on a well known benchmark problem, as well as on a simplifie moel of a real life application, namely the optimization of an elevator group controller, extening the approaches propose in [] an [3]. For this purpose, the Particle Swarm Optimization (PSO) [4], which belongs to the class of swarm intelligence algorithms, an the Neler an Mea Simplex metho (NMS) [5] have been use. The rest of the paper is organize as follows: the consiere algorithms are briefly escribe in Section. The computational statistics backgroun is escribe in Section 3, while, experimental results are reporte in Sections 4 an 5. The paper closes with conclusions in Section 6. The Consiere Algorithms. The Particle Swarm Optimization Algorithm PSO is a swarm intelligence optimization algorithm [6]. The main inspiration behin PSO was the flocking behavior of swarms an fish schools. PSO has prove to be very efficient in numerous applications in science an engineering [4, 7 4]. PSO s convergence is controlle by a set of parameters that are usually either etermine empirically or set equal to wiely use efault values. Corresponing author: e-mail: thomas.bartz-beielstein@uo.eu, Phone: + 49 3 97977, Fax: + 49 3 97959 e-mail: kostasp@math.upatras.gr, Phone: +3 6 997348, Fax: +3 6 99965 e-mail: vrahatis@math.upatras.gr, Phone: +3 6 997374, Fax: +3 6 99965

44 T. Bartz-Beielstein, K.E. Parsopoulos, an M.N. Vrahatis: Design an Analysis of Optimization Algorithms PSO belongs to the class of stochastic, population base optimization algorithms [4]. It exploits a population of iniviuals to probe the search space. In this context, the population is calle a swarm an the iniviuals are calle particles. Each particle moves with an aaptable velocity within the search space, an it retains in a memory the best position it has ever visite. There are two main variants of PSO with respect to the information exchange scheme among the particles. In the global variant, the best position ever attaine by all iniviuals of the swarm is communicate to all the particles at each iteration. In the local variant, each particle is assigne to a neighborhoo consisting of prespecifie particles. In this case, the best position ever attaine by the particles that comprise a neighborhoo is communicate among them [4]. Neighboring particles are etermine base rather on their inices than their actual istance in the search space. Clearly, the global variant can be consiere as a generalization of the local variant, where the whole swarm is consiere as the neighborhoo for each particle. In the current work we consier the global variant only. Assume an n imensional search space, S R n, an a swarm consisting of s particles. The i th particle is an n imensional vector, x i =(x i,x i,...,x in ) S. The velocity of this particle is also an n imensional vector, v i =(v i,v i,...,v in ). The best previous position encountere by the i th particle (i.e., its memory) in S is enote by, p i =(p i,p i,...,p in) S. Assume g to be the inex of the particle that attaine the best previous position among all the particles in the swarm, an t to be the iteration counter. Then, the resulting equations for the manipulation of the swarm are [7], v i (t +) = wv i (t)+c r (p ( i (t) x i (t)) + c r p g (t) x i (t) ), () x i (t +) = x i (t)+v i (t +), () where i =,,...,s; w is a parameter calle the inertia weight; c an c are positive constants, calle the cognitive an social parameter, respectively; an r, r are vectors with components uniformly istribute in [, ]. All vector operations are performe componentwise. Usually, the components of x i an v i are boune as follows, x min x ij x max, v max v ij v max, j =,...,n, where x min an x max efine the bouns of the search space, an v max is a parameter that was introuce in early PSO versions to avoi swarm explosion that was cause by the lack of a mechanism for controlling the velocity s magnitue. Although the inertia weight is such a mechanism, empirical results have shown that using v max can further enhance the algorithm s performance. Experimental results inicate that it is preferable to initialize the inertia weight to a large value, in orer to promote global exploration of the search space, an graually ecrease it to get more refine solutions. Thus, an initial value aroun an a graual ecline towars is consiere a proper choice for w. If a maximum number of iterations, t max, has been etermine, then the inertia weight can be scale accoring to the following scheme: if w max is the maximum value of the inertia weight, two real value parameters, w scale,w iterscale [, ] are etermine, such that w is linearly ecrease from w max to w max w scale,overt max w iterscale iterations. Then, for the last t max ( w iterscale ) iterations, it has a constant value, equal to w max w scale. Table summarizes the exogenous parameters of PSO. Proper fine tuning of the parameters may result in faster convergence an alleviation of local minima [, 3, 7, 8]. Different PSO versions, such as PSO with constriction factor, have been propose [9]. In the constriction factor variant, Eq. () reas, v i (t +) = χ [ v i (t)+c r (p i (t) x i (t)) + c r ( p g (t) x i (t) )], (3)

Appl. Num. Anal. Comp. Math., No. 3 (4) / www.anacm.org 45 Table Default settings of the exogenous parameters of the PSO algorithm. Similar esigns have been use in [6] to optimize well known benchmark functions. Symbol Parameter Range Default Constriction s swarm size N 4 4 c cognitive parameter R +.4944 c social parameter R +.4944 w max starting value of the inertia weight w R +.9.79 w scale final value of w in percentage of w max R +.4. w iterscale percentage of iterations, for which w max is reuce R +.. v max maximum value of the step size (velocity) R + Table Default settings of the exogenous parameters of PSO with constriction factor. Recommenations from [9]. Symbol Parameter Range Default Values s swarm size N 4 χ constriction coefficient R +.79 ϕ multiplier for ranom numbers R + 4. v max maximum value of the step size (velocity) R + where χ is the constriction factor [3], an it is erive analytically through the formula [9], κ χ = ϕ ϕ 4ϕ, (4) for ϕ>4, where ϕ = c + c, an κ =. Different configurations of χ, as well as a thorough theoretical analysis of the erivation of Eq. (4), can be foun in [9]. Equations () an (3) are algebraically equivalent. In our experiments, the so calle canonical PSO variant propose in [3], which is the constriction variant efine by Eq. (3) with c = c, has been use. The corresponing parameter setting for the constriction factor variant of PSO is reporte in the last column (enote as Constriction ) of Table, where χ is reporte in terms of its equivalent inertia weight notation, for uniformity reason. PSO is a stochastic algorithm, an, therefore, ranom number generators are use. An optimization practitioner is intereste in robust solutions, i.e., solutions inepenent from the sees of the ranom number generators. The propose statistical methoology provies guielines to esign robust PSO algorithms uner restrictions, such as a limite number of function evaluations an processing units. These restrictions can be moele by consiering the performance of the algorithm in terms of the (expecte) best function value for a limite number of fitness function evaluations. A iscussion on problems from ifferent classes, incluing real worl optimization problems, is provie in [5].. The Neler Mea Simplex Algorithm The Neler Mea Simplex (NMS) algorithm was evelope by Neler an Mea in 965 [5]. It was motivate by the observation that (n+) points are aequate to ientify a ownhill irection in an n imensional lanscape. However, (n +)points efine also a non-egenerate simplex in R n. Thus, it seeme a goo iea to exploit a simplex for probing the search space, using only function values [3]. Neler an Mea incorporate a set of moves that enhance the algorithm s performance, namely reflection, expansion, contraction, an shrinkage. A new point is generate at each iteration. Its function value is compare to the function values of the vertices of the simplex. One of the vertices is replace by the new point. Reflection reflects a vertex of the simplex through the centroi of the opposite face. Expansion allows the algorithm to take a longer step from the reflection point (centroi) towars the reflecte vertex, while contraction halves the length of the step, thereby resulting in a more conservative search. Finally, shrinkage reuces the length of all eges that are ajacent to the best vertex, i.e., the vertex with the smallest function value. Thus, there are 4 parameters to be specifie, namely the coefficients of reflection, ρ, expansion, χ, contraction, γ, an shrinkage, σ. Default settings of these parameters are reporte later in Table 8. NMS is consiere a quite robust but relatively slow algorithm [3]. However, it works reasonably well for non ifferentiable functions.

46 T. Bartz-Beielstein, K.E. Parsopoulos, an M.N. Vrahatis: Design an Analysis of Optimization Algorithms 3 Computational Statistics The term computational statistics subsumes computationally intensive methos [33]. Statistical methos, such as experimental esign techniques an regression analysis can be use to analyze the experimental setting of algorithms on specific test problems. One important goal in the analysis of search algorithms is to fin variables that have a significant influence on the algorithm s performance, which can be quantitatively efine as the average obtaine function value in a number (e.g. 5) of inepenent experiments. This measure was also use in [6]. Questions like how oes a variation of the swarm size influence the algorithm s performance? or are there any interactions between swarm size an an the value of the inertia weight? are important research questions that provie an unerstaning of the funamental principles of stochastic search algorithms such as PSO. The approach presente here combines classical techniques from statistical esign of experiments (DOE), classification an regression trees (CART), an moern esign an analysis of computer experiments (DACE) techniques. Thorough escriptions of these methoologies can be foun in [34 36]. In [37], a comparison of DOE, CART an DACE for irect search algorithms, is provie. 3. Design an Analysis of Computer Experiments (DACE) Let a enote a set of vectors with specific settings of an algorithm. This is calle the algorithm esign. In the case of PSO, a can inclue parameters such as swarm size, the values of c an c, etc. Note that a esign can consist of one vector only. The optimal esign is enote as a. The problem esign, p, provies information relate to the optimization problem, such as the available resources (number of function evaluations), the problem s imension, etc. The run of a stochastic search algorithm can be treate as an experiment with a stochastic output, Y (a,p ). One of our goals is to fin an algorithm esign, a, that optimizes the output (function value). Since DACE was introuce for eterministic computer experiments, repeate runs are necessary to apply this technique to stochastic search algorithms. In the following, the specification of the DACE process moel that will be use later to analyze our experiments is escribe. This specification is similar to the selection of a linear or quaratic regression moel in classical regression. DACE provies methos to preict unknown values of a stochastic process an it can be applie to interpolate observations from computationally expensive simulations. In the DACE stochastic process moel, a eterministic function is evaluate at m esign points. The stochastic process moel propose in [38] expresses the eterministic response, y(x), for a imensional input, x, as a realization of a regression moel, F, an a stochastic process, Z, Y (x) =F(β,x)+Z(x). (5) The moel use in classical regression, Y = βx + ε, is a special case of Eq. (5). The stochastic process Z( ) is assume to have zero mean an covariance equal to V (ω, x) =σ R(θ, ω, x), between Z(ω) an Z(x), with process variance σ an correlation moel R(θ, ω, x). Following the analysis in [38], we use ρ functions, f j : R R, j =,...,ρ, to efine the regression moel, ρ F(β,x) = β j f j (x) = f(x),β, j= where, f(x) =(f (x),...,f ρ (x)), β =(β,...,β ρ ), an, enotes the inner prouct of the vectors. DACE also provies an estimation of the preiction error of an untrie point, or the mean square error, MSE(x), of the preictor. Correlations of the form, R(θ, ω, x) = R j (θ, ω j x j ), j= are consiere in our experiments. The correlation function shoul be chosen with respect to the unerlying process [39]. In [4], seven ifferent moels are iscusse. The Gaussian correlation function, R j (θ, h j ) = exp ( θ j h j), (6)

Appl. Num. Anal. Comp. Math., No. 3 (4) / www.anacm.org 47 Table 3 Sequential approach. This approach combines methos from computational statistics an exploratory ata analysis to improve (tune) the performance of irect search algorithms. Step (S-) (S-) (S-3) (S-4) (S-5) (S-6) (S-7) (S-8) (S-9) (S-) (S-) (S-) Action Pre experimental planning Scientific hypothesis Statistical hypothesis Specification (a) optimization problem, (b) constraints, (c) initialization metho, () termination metho, (e) algorithm (important factors), (f) initial experimental esign, (g) performance measure Experimentation Statistical moeling of ata an preiction Evaluation an visualization Optimization Termination: If the obtaine solution is goo enough, or the maximum number of iterations has been reache, go to step (S-) Design upate an go to step (S-5) Rejection/acceptance of the statistical hypothesis Objective interpretation of the results from step (S-) has been use in our experiments, with h j = ω j x j, an θ j >. 3. Sequential Designs Base on DACE Prior to the execution of experiments with an algorithm, the experimenter has to specify suitable parameter settings for the algorithm, i.e., an algorithm esign, a. Often, esigns that use sequential sampling are more efficient than esigns with fixe sample sizes. First, an initial esign, a (), is specifie. Information obtaine in the first runs can be use for the etermination of the secon esign, a (), in orer to chose new esign points more efficiently. Sequential sampling approaches with aaptation have been propose for DACE. For example, in [38], sequential sampling approaches with an without aaptation were classifie to the existing meta moel. We will present a sequential approach that is base on the expecte improvement. In [36, p. 78], a heuristic algorithm for unconstraine global minimization problems is presente, an, if ymin k enotes the smallest known minimum value after k runs of the algorithm, y(x) is the algorithm s response (realization of Y (x) in Eq. (5)), an x a, i.e., x represents a specific esign point from the algorithm esign, a, then, the improvement is efine as, { y k min y(x), ymin k y(x) >, Improvement at x = (7), ymin k y(x). The iscussion in [36, p. 78] leas to the conclusion that new esign points are attractive if either there is a high probability that their preicte output is below the current observe minimum an/or there is a large uncertainty in the preicte output. This result comes in line with the experimenters intention to avoi sites that guarantee worse results. The sequential approach consists of the twelve steps that are reporte in Table 3. During the pre experimental planning phase (S-) the experimenter efines exactly what is to be stuie an how the ata are to be collecte. The recognition an statement of the problem seems to be a rather obvious task. However, in practice, it is not simple to formulate a generally accepte goal. Discovery, confirmation, an robustness are only three possible scientific goals of an experiment. Discovery asks what happens if new operators are implemente. Confirmation

48 T. Bartz-Beielstein, K.E. Parsopoulos, an M.N. Vrahatis: Design an Analysis of Optimization Algorithms analyzes how the algorithm behaves on ifferent problems, an robustness asks for conitions that ecrease the algorithm s performance. Statistical methos like run length istributions (RLD) provie suitable means to measure the performance an escribe the qualitative behavior of optimization algorithms. A plot of an RLD is epicte for our experiments in Fig. 6. The algorithm to be analyze is run k times, with ifferent ranom number generator sees, on a given problem instance. The maximum number of function evaluations, t max, is set to a relatively high value. For each successful run, the number of require function evaluations, t run, is recore. If the run fails, t run is set to infinity. The empirical cumulative istribution function (CDF) is the cumulative istribution function that represents these results. Let t run (j) be the run length for the j th successful run. Then, the empirical CDF is efine as [4], Pr (t run (j) t) = {#j t run(j) t}, (8) k where {#j t run (j) t} enotes the number of inices j, such that t run (j) t. In step (S-), the experimental goal shoul be formulate as a scientific hypothesis, e.g. oes an employe scheme A improve the algorithm s performance?. A statistical hypothesis, such as there is no ifference in means comparing the performance of the two competing schemes, is formulate in the step (S-3) that follows. Step (S-4) requires at least the specification of, (a) an optimization problem, (b) constraints (for example the maximum number of function evaluations), (c) an initialization metho, () a termination metho, (e) an algorithm, an the specification of its important factors, (f) an initial experimental esign, an, (g) a measure to juge the performance. Regaring (c), several methos have been use for the initialization of the population in population base algorithms, or the etermination of an initial point, x (), in algorithms that use a single search point. More specifically, iniviuals that comprise a population can be initialize, (I-) eterministically with equal starting points, i.e., the same point, x (), is assigne to every iniviual. The starting point is specifie by the experimenter, (I-) eterministically, but with ifferent starting points, x (), for each repeat run. The corresponing points x () are selecte from an n imensional interval, [x l,x u ] n, where x l an x u specify lower an the upper bouns, respectively. Stochastic variants of (I-) an (I-) are enote as (I-3) an (I-4), respectively. Nowaays, (I-4) is commonly use in evolutionary computation, whereas (I-) is wiely use for global optimization. For example, an asymmetric initialization scheme was use in [6], where the initial positions of the particles, x i (), i =,...,s, were chosen uniformly istribute in the range [5, 3] n. Initialization metho (I-) was propose in [4]. An algorithm terminates if, () the problem was solve. Domain convergence (T-), an function value convergence (T-) can be istinguishe, () the algorithm has stalle (T-3), (3) the resources, e.g. the maximum number of function evaluations, t max, are exhauste (T-4). The corresponing problem esign, p, that summarizes the information from (a) to () for our experiments with PSO is reporte in Table 4 (a escription of the S ring moel is postpone until the presentation of the experimental results), while the algorithm esign, a, is reporte in Table 5 an summarizes steps (e) an (f). An experimental esign, e, consists of both a problem an an algorithm esign. The experimental goal of the sequential approach presente here can be characterize as the etermination of an optimal (or improve)

Appl. Num. Anal. Comp. Math., No. 3 (4) / www.anacm.org 49 Table 4 Problem esign, p, for the PSO runs. The experiment s number, the number of runs, k, the maximum number of function evaluations, t max, the problem s imension, n, the initialization metho, the termination criterion, the lower, x l, an upper, x u, bouns for the initialization, as well as the optimization problem are reporte. Experiment N t max n Init. Term. x l x u Problem 5 5 (I-4) (T-4) 5 3 Rosenbrock (F) 5 (I-3) (T-4) S ring Table 5 Algorithm esign, a, for the inertia weight PSO variant that correspons to the experiment of Table 4, which optimizes the imensional Rosenbrock function. a (l) an a(u) enote the lower an upper bouns to generate the LHD, respectively, an a enotes the parameter settings of the improve esign that was foun by the sequential approach. s c c w max w scale w iterscale v max a (l) 5...7..5 a (u).5.5.99.5 75 a.543.74587.788797.8645.93793.496 algorithm esign, a, for a given problem esign, p. Measures to juge the performance are presente in the next sections (Table 7). The constriction factor variant of PSO requires the etermination of four exogenous strategy parameters, namely the swarm size, s, constriction factor, χ, parameter ϕ = c + c, an the maximum velocity, v max.at each stage, Latin Hypercube Designs (LHD) are use. In [43] it is reporte that experience with the stochastic process moel ha inicate that times the expecte number of important factors is often aequate number of runs for the initial LHD. Thus, an LHD with at least m =5esign points was chosen. This is the minimum number of esign points to fit a DACE moel that consists of a secon orer polynomial regression moel an a Gaussian correlation function. The former requires + 4 i= i =esign points, while the latter requires 4 esign points. Note that for m =5there are no egrees of freeom left to estimate the mean square error of the preictor [36]. After that, the experiment is run (S-5). Preliminary (pilot) runs can give a rough estimate of the experimental error, run times, an the consistency of the experimental esign. Again, RLDs can be very useful. Since we consier probabilistic search algorithms in our investigation, functions may be evaluate several times, as iscusse in [36]. The experimental results provie the base for moeling an preiction in step (S-6). The moel is fitte an a preictor is obtaine for each response. The moel is evaluate in step (S-7). Several visualization techniques can be applie. Simple graphical methos from exploratory ata analysis are often helpful. Histograms an scatter plots can be use to etect outliers. If the initial ranges for the esigns were chosen improperly (e.g., very wie initial ranges), visualization of the preictor can guie the choice of more suitable (narrower) ranges in the next stage. Several techniques to assess the valiity of the moel have been propose. Cross valiation preictions versus actual values, as well as stanarize cross valiation resiuals versus cross valiation preictions can be plotte. Sensitivity analysis can be use to ascertain how much a statistical moel epens on its factors. In [44 47], variance base methos that are use in sensitivity analysis are analyze, while, in [36, pp. 93] sensitivity analyses base on ANOVA type ecompositions are escribe. The computation of sensitivity inices can be performe by ecomposing the response into an average, main effects for each input, two input interactions an higher orer interactions [38, pp. 47]. Aitional graphical methos can be use to visualize the effects of factors an their interactions on the preictors. The 3 imensional visualizations epicte in Fig., prouce with the DACE toolbox [48], have prove to be very useful. The preicte values can be plotte to support the numerical analysis, an the MSE of preiction is use to asses the accuracy of the preiction. We explicitly note here, that statistical moels can provie only guielines for further experiments, an they o not prove that a factor has a particular effect. If the preicte values are not accurate, the experimental setup has to be reconsiere. This comprehens especially the specification of the scientific goal an the ranges of the esign variables. Otherwise, new esign

4 T. Bartz-Beielstein, K.E. Parsopoulos, an M.N. Vrahatis: Design an Analysis of Optimization Algorithms points in promising subregions of the search space can be etermine (S-8) if further experiments are necessary. Thus, a termination criterion has to be teste (S-9). If it is not fulfille, base on the expecte improvement efine by Eq. (7), new caniate esign points can be generate (S-). A new esign point is selecte if there is a high probability that the preicte output is below the current observe minimum an/or there is a large uncertainty in the preicte output. Otherwise, if the termination criterion is true, an the obtaine solution is goo enough, the final statistical evaluation (S-) that summarizes the results is performe. A comparison between the first an the improve configuration shoul be performe. Techniques from exploratory ata analysis can complement the analysis at this stage. Besies an investigation of the numerical values, such as mean, meian, minimum, maximum, an stanar eviation, graphical presentations such as boxplots, histograms, an RLDs can be use to support the final statistical ecision (e.g. see Fig. 4). Finally, we have to ecie whether the result is scientifically important (S-), since the ifference, although statistically significant, can be scientifically meaningless. Here comes the experimenter s skill into operation. The experimental setup shoul be reconsiere at this stage an questions like have suitable test functions or performance measures been chosen? or i floor or ceiling effects occur? must be answere. Test problems that are too easy may cause such ceiling effects. If two algorithms, A an B, achieve their maximum level of performance (or close to it), then the hypothesis performance of A is better than performance of B shoul not be confirme [49]. Floor effects escribe the same phenomenon on the opposite site of the performance scale, i.e., the test problem is so har that nearly no algorithm can solve it correctly. Such effects can occur when the number of function evaluations is chosen very small. For example, in the Rosenbrock function, a starting point, x () i = ( 6, 6 ), an a buget of only function evaluations can cause such effects. Performance profiles can help the experimenter to ecie whether ceiling effects have occurre. 4 Experimental Results Initially, we investigate a simple an well known test function [6] to gain an intuition regaring the workings of the propose technique. In the next step of our analysis, the S ring moel was consiere. We provie a emonstration of the sequential approach by conucting a brief investigation for the Rosenbrock function, using the two variants of PSO as well as the NMS algorithm. Experimental results of evolutionary algorithms presente in empirical stuies are sometimes base on a huge number of fitness function evaluations (larger than 5 ), even for simple test functions. Our goal is to emonstrate how statistical esign methos, e.g. DACE, can reuce this number significantly. The propose approach is thoroughly analyze for the case of the inertia weight variant of PSO. 4. Optimizing the inertia weight variant of PSO This example escribes in etail how to tune the exogenous parameters of PSO, an it extens the approach presente in [8]. Experimental results presente in [6] have been chosen as a starting point for our analysis. (S-) Pre experimental planning: Pre experimental tests to explore the optimization potential supporte the assumption that tuning might improve the algorithm s performance. RLD reveale that there exists a configuration that was able to complete the run successfully using less than 8 iteration, for nearly 8% of the cases. This was less than half the number of function evaluations use in the reference stuy, justifying the usefulness of the analysis. (S-) Scientific hypothesis: There exists a parameterization, a, of PSO that improves its efficiency significantly. (S-3) Statistical hypothesis: PSO with the parameterization a outperforms PSO with the efault parameterization, a (), which is use in [6]. (S-4) Specification: Results from the Sphere function that was investigate in [6], were not very meaningful. Therefore, the generalize Rosenbrock function was chosen for our comparisons. This test problem is, generally, efine as, n ( f(x) = (xi+ x i ) +( x i ) ), (9) i=

Appl. Num. Anal. Comp. Math., No. 3 (4) / www.anacm.org 4 Table 6 Inertia weight PSO optimizing the imensional Rosenbrock function. Each row represents the best algorithm esign at the corresponing tuning stage. Note, that function values (reporte in the first column) can worsen (increase) although the esign is improve. This happens ue to the noise in the results, y. The probability that a seemingly goo function value that is in fact worse, might occur, ecreases uring the sequential proceure, because the number of re evaluations is increase. The number of repeats is ouble if a configuration performs best twice. The corresponing configurations are marke with an asterisk. y s c c w max w Scale w IterScale v max Config 6.6557 6.45747.9885.774.4878.683856 477.874 4 8.596 39.343.8494.875.73433.83638 89.9 9 7.44 6.45747.9885.774.4878.683856 477.874 4 78.477 3.96.63.94476.897.893788 37.343 3 75.654 3.96.63.94476.897.893788 37.343 3 9.935 8.849.6993.9585.56979.84937 95.39 35 9.5438.557.577.93759.49868.5967 68.9 43 93.754.5898.49.785.46967.54545 98.974 5 93.9967 93.76.8.9663.3786.97556.765 99.485 39.343.8494.875.73433.83638 89.9 9 7.595.3995.36.7853.36658.966 56.996 57 46.47.5468.4853.87656.39995.9974 6.56 47.4.7657.7343.795.355.57449 5.5 54 98.3663.7657.7343.795.355.57449 5.5 54 4.3997.543.74587.788797.8645.93793.496 67 43.49.543.74587.788797.8645.93793.496 67 53.3545.543.74587.788797.8645.93793.496 67 where n is the problem s imension. Its global minimizer is x =(, ), with function value f =, an x () =(., ) is consiere a goo starting point [5]. We consiere the imensional case the function. Following the experimental esign in [6], we recore the mean fitness value of the best particle of the swarm at each one of the 5 runs. This value is enote as f (5). For the generation of RLD plots, a threshol value to istinguish successful from unsuccessful runs was specifie. A run configuration was classifie as successful, if f (5) < f (Shi), where f (Shi) =96.75 is the value reporte in [6]. The initial problem esign, which consists of the objective function, its imension, the maximum number of function evaluations, the initialization metho an the termination criterion was reporte in Table 4. The corresponing setting for the algorithm esign is reporte in Table 5. An algorithm esign that covers a wie range of interesting parameter settings (region of interest) was chosen, an no problem specific knowlege for the Rosenbrock function was use to perform the experiments, expecting that the sequential approach will guie the search into successful regions. (S-5) Experimentation: Table 6 presents the optimization process. Each line in Table 6 correspons to one optimization step in the sequential approach. At each step, two new esigns are generate an the best is re evaluate. This is similar to the selection proceure in (+) Evolution Strategies [5]. The number of repeat runs, k, of the algorithm esigns is increase (ouble), if a esign has performe best twice or more. A starting value of k =was chosen. For example, esign 4 performs best at iteration an iteration 3. It has been evaluate 4 times, therefore the number of evaluations is set to 4 for every newly generate esign. This provies a fair comparison an reuces the risk of incorrectly selecting a worse esign. (S-6) Statistical moeling an preiction: Following [36], the response is moele as a realization of a regression moel an a ranom process as escribe in Eq. (5). A Gaussian correlation function as efine in

4 T. Bartz-Beielstein, K.E. Parsopoulos, an M.N. Vrahatis: Design an Analysis of Optimization Algorithms C <.4584 P < 8 936 656.5 C <.3497 At this noe: 8 < P C <.46837 C <.3497 C <.46837 33869.387 349.4955 65.65 Fig. Regression tree. Values at the noes show the average function values for the associate noe. The value in the root noe is the overall mean. The left son of each noe contains the configurations that fulfill the conition in the noe. c + c > 4 prouce outliers that complicate the analysis. In aition, this analysis shows that the swarm size, s, shoul be larger than 8. Eq. (6), an a regression moel with polynomial of orer, have been use. Hence, the moel reas, Y (x) = ρ β j f j (x)+z(x), () j= where Z( ) is a ranom process with mean zero an covariance V (ω, x) = σ R(θ, ω, x), an the correlation function was chosen as, R(θ, ω, z) n exp ( θ j (ω j x j ) ). () j= Aitionally, at certain stages, a tree base regression moel, as shown in Fig., was constructe to etermine parameter settings that prouce outliers. (S-7) Evaluation, visualization: The MSE an the preicte values can be plotte to support the numerical analysis (we prouce all 3 imensional visualizations with the DACE toolbox [48]). For example, the interaction between c an c is shown in Fig.. Values of c an c with c + c > 4 generate outliers that might isturb the analysis. To alleviate these outliers, a esign correction metho has been implemente, namely c = c 4, if, c + c > 4. The right part of Fig. illustrates the estimate MSE. Since no esign point has been place in the ranges <c <.5 an.5 <c <.5, the MSE is relatively high. This might be an interesting region where a new esign point will be place uring the next iteration. Figure 3 epicts the same situation as Fig. after the etermination of the esign correction. In this case, a high MSE is associate with the region c + c > 4, but no esign point will be place there. (S-8) Optimization: Termination or esign upate. Base on the expecte improvement efine in Eq. (7), two new esign points, a (), an, a(), have been generate. These two esigns will be evaluate an their performances will be compare to the performance of the current best esign. The best esign foun so far is re evaluate. The iteration terminates, if a esign was evaluate for k =5times an the solution is obtaine. The values in the final moel rea, s =, c =.543, c =.74587, w max =.788797, w scale =.8645, w iterscale =.93793, an v max =.496. This result is shown in the last row of Table 6.

Appl. Num. Anal. Comp. Math., No. 3 (4) / www.anacm.org 43 x 6 x 3 3 Fitness.5 C.5.5 C.5 MSE.5 C.5.5 C.5 Fig. Preicte values (left) an MSE (right). As can be seen in the left figure, c + c > 4 prouce outliers that complicate the analysis. The plots present the same ata as the regression tree in Fig.. Fitness x 5 MSE x 3.5 C.5.5 C.5.5 C.5.5 C.5 Fig. 3 Preicte values (left) an MSE (right). The esign correction avois settings with c + c > 4 that prouce outliers (left). Therefore, a high mean square error exists in the exclue region (right). (S-) Rejection/acceptance: Finally, we compare the configuration from [6] to the optimize configuration. The final (tune) an the first configurations are repeate 5 times. Histograms an boxplots are illustrate in Fig. 4 for both variants of PSO. The tune esign of the inertia weight PSO variant clearly improves its performance of the PSO algorithm. Statistical analysis is reporte in Table 7. Performing a classical t test inicates that the null hypothesis there is no ifference in the mean performances of the two algorithms can be rejecte at the 5% level. (S-) Objective interpretation: The statistical results from (S-) give goo reasons for the assumption that PSO with the tune esign performs better (on average) than the efault esign. Comparing the parameters of the improve esign, a, reporte in Table 5, with the efault settings of PSO, no significant ifferences can be observe. Only v max, the maximum value of the velocity, appears to be relatively small. A swarm size of particles appears to be a goo value for this problem instance. The analysis an the tuning proceure escribe so far have been base solely on the average function value in 5 runs. This value may be irrelevant in a ifferent optimization context. For example, the best function value (minimum) or the meian can be alternatively use. A similar optimization proceure coul have been performe for any of these cases with the presente sequential approach. However, the optimal esign presente in this stuy is only applicable for this specific optimization task (or experimental esign). As in [6], the starting points have been initialize ranomly in the range [5, 3] n.

44 T. Bartz-Beielstein, K.E. Parsopoulos, an M.N. Vrahatis: Design an Analysis of Optimization Algorithms 5 8 5 5 log(function value) 6 4 4 6 8 log(function value) Configuration 4 8 6 4 4 6 8 log(function value) log(function value) 8 7 6 5 4 3 Configuration Fig. 4 Histogram an box plot. Left: Soli lines an light bars represent the improve esign. Right: The efault configuration is enote as, whereas enotes the improve variant. Top: Both plots inicate that the tune inertia weight PSO version performs better than the efault version. Bottom: No clear ifference can be etecte when comparing the efault with the improve constriction factor PSO variant. Table 7 Result table for the Rosenbrock function. Default esigns from [6, 9, 3], as well as the improve esign for all algorithms, for k =5runs, are reporte. Design Algorithm Mean Meian StD Min Max a (Shi) PSO (iner.).8383 3 59.6 3.965 3 64.6365 859 a PSO (iner.) 39.79 9.4443 55.383.7866 54.9 a (Clerc) PSO (con.) 6. 58.5 378.8 4.55.6 3 a PSO (con.) 6.9 37.65 65.9.83 647.9 a (Lagarias) NMS 9.7 3.49 3.54 4 53.48 54966 a NMS.9 9.57.79 79.7893 73.4 Quasi Newton 5.455 5.796 8.63.657 6.8 Hence, ifferent sources of ranomness are mixe in this example. The following stuies will be base on eterministically generate starting points, as recommene in [4].

Appl. Num. Anal. Comp. Math., No. 3 (4) / www.anacm.org 45 Table 8 Default settings of the exogenous parameters of the NMS algorithm. This esign is use in the MATLAB optimization toolbox [3]. Factor Symbol Parameter Range Default Values z ρ reflection γ>. z χ expansion χ>max{,ρ}. z 3 γ contraction <γ<.5 z 4 σ shrinkage <σ<.5 Table 9 a (u) PSO constriction factor: Algorithm esign. The variables s, χ, ϕ, an v max have been efine in Table. a (l) enote the ranges of the LHD, a is the improve esign an a Clerc is the esign suggeste in [9]. s χ ϕ v max a (l) 5.68 3. a (u).8 4.5 75 a 7.759 3.5 34.438 a Clerc.79 4. an 4. Optimizing the PSO constriction factor variant The esign of the PSO constriction factor variant was tune in a similar manner as the inertia weight variant. The initial LHD is reporte in Table 9, where a (l) an a (u) enote the lower an upper bouns of the region of interest, respectively, a is the improve esign that was foun by the sequential proceure, an aclerc is the efault esign recommene in [9]. As can be seen from the numerical results reporte in Table 7, an the corresponing graphical representations (histograms an boxplots) in Fig. 4, there is no significant ifference between the performance of the tune a an aclerc [9]. 4.3 Optimizing the Nonlinear Simplex Algorithm In the NMS algorithm, 4 parameters must be specifie, namely the coefficients of reflection, ρ, expansion, χ, contraction, γ, an shrinkage, σ. Default settings are reporte in Table 8. Figure 5 epicts two interesting results from the tuning of the NMS algorithm. The value of the reflection parameter, ρ, shoul be smaller than.5 (left). As can be seen in the right part of Fig. 5, there exists a relatively small local optimum regaring χ (expansion parameter) an ψ (contraction parameter), respectively. 4.4 Synopsis In aition to the optimization algorithms analyze so far, the performance of a Quasi Newton metho was analyze. An implementation from the commercial MATLAB optimization toolbox was use in the experiments. Quasi Newton clearly outperforme the rest algorithms, as can be seen from the results in Table 7. A comparison of the RLDs of the three algorithms are reporte shown in Fig. 6. The results support the claim that PSO performs better than the NMS algorithm. Only the tune version of the latter was able to complete % of the experiments with success. Regaring the two PSO variants, it is not obvious which one performs better. After the tuning process, the inertia weight variant appears to be better, but it requires the specification of seven (compare to only four in the constriction factor variant) exogenous parameters. However, the Rosenbrock function is mostly of acaemic interest, since it lacks many features of a real worl optimization problem.

46 T. Bartz-Beielstein, K.E. Parsopoulos, an M.N. Vrahatis: Design an Analysis of Optimization Algorithms fitness.5.5.5 x 8 fitness x 8.5.5.5.5.5 sigma. rho.4.6.8.5 chi.9.8.7 psi.6.5.4 Fig. 5 NMS simplex algorithm. Left: Preicte values. Function values (fitness) as a function of σ an ρ. The figure suggests that the value of ρ shoul be ecrease. Right: Preicte values. A local optimum can be etecte. F(X).8.6.4 Inertia Inertia* Constr Constr* NelerM NelerM* Empirical CDF. 5 5 5 Function Evaluations Fig. 6 Run length istribution. Step (S-), the final comparison of the canonical an the improve esign base on RLDs. Asterisks enote improve configurations. The improve inertia weight version of PSO succeee in more than 8% of the experiments with less than 5 function evaluations. The stanar NMS algorithm faile completely, but it was able with an improve esign to succee in % of the runs after 5 function evaluations. For a given buget of 5 function evaluations, both the constriction factor an the improve inertia weight PSO variants perform equally well. 5 Results for the Real Worl Optimization Problem 5. The Elevator Group Controller Problem The construction of elevators for high rise builings is a challenging task. Toay s urban life cannot be imagine without elevators. The elevator group controller is a central part of an elevator system. It assigns elevator cars to service calls in real time, while optimizing the overall service quality, the traffic throughput, an/or the energy consumption. The Elevator Supervisory Group Control (ESGC) problem can be classifie as a combinatorial optimization problem [5 54]. It exhibits the same complex behavior as many other stochastic traffic control

Appl. Num. Anal. Comp. Math., No. 3 (4) / www.anacm.org 47 problems, such as materials hanling systems with Automate Guie Vehicles (AGVs). Due to many ifficulties in analysis, esign, simulation, an control, the elevator optimization problem has been stuie for a long time. First approaches were mainly base on analytical methos erive from queuing theory. However, toay, Computational Intelligence (CI) methos an other heuristics are accepte as state of the art [, 55]. The elevator group controller etermines the floors where the elevator cars shoul go to. Since the group controller is responsible for the allocation of elevators to hall calls, a control strategy or policy, π, to perform this task in an optimal manner is require. One important goal in esigning a better controller is the minimization of the time that passengers have to wait until they can enter an elevator car after having requeste service. This time span is calle the waiting time. The so calle service time inclues, aitionally, the time that a passenger remains in the elevator car. During a ay, ifferent traffic patterns can be observe. For example, in office builings, an up peak traffic is observe in the morning, when people start working, an, symmetrically, a own-peak traffic is observe in the evening. Most of the ay there is balance traffic with much lower intensity than at peak times. Lunchtime traffic consists of two (often overlapping) phases where people first leave the builing for lunch or hea for a restaurant floor, an then get back to work [56]. The ESGC problem subsumes the following problem, How to assign elevators to passengers in real time, while optimizing ifferent elevator configurations with respect to overall service quality, traffic throughput, energy consumption etc. Fujitec, one of the worl s leaing elevator manufacturers, evelope a controller that uses a Neural Network (NN) an a set of fuzzy controllers. The weights on the output layer of the NN can be moifie an optimize, thereby resulting in a more efficient controller. The associate optimization problem is quite complex. Empirical investigations have shown that the istribution of local optima in the search space is unstructure, an there are flat plateaus of equal function values. Furthermore, function values are stochastically isturbe an ynamically changing because they are influence by the non eterministic behavior of customers in the builing. Experiments have shown that graient base optimization techniques cannot be applie successfully on such problems. Therefore, irect search algorithms have been chosen [57]. Elevator group controllers an the relate policies are usually incomparable. To enable comparability for benchmark tests, a simplifie elevator group control moel, calle the S ring moel, was evelope. This moel (a) enables fast an reproucible simulations, (b) is applicable to ifferent builings an traffic patterns, (c) is scalable an extensible, an, () can be use as a test problem generator. The approach presente here uses both the actual Fujitec s simulator, which is epicte in Fig. 7 an has high accuracy but heavy computational cost, an the coarse (surrogate) S ring moel, which is epicte in Fig. 8 an is fast to solve, although less accurate. The propose approach incorporates also Space Mapping (SM) techniques, which are use to iteratively upate an optimize surrogate moels [59], an the main goal is the computation of an improve solution with a minimal number of function evaluations. Let i enote a site in an elevator system. Then, a bit state, (s i,c i ), is associate with it. The s i bit is set to if a server is present on the i th floor, or otherwise. Corresponingly, the c i bit is set to or if there is no waiting passenger at site i, or at least one waiting passenger, respectively. Figure 8 epicts a typical S ring configuration. The state of the system at time t can be escribe by the vector, ( ) x(t) = s (t),c (t),...,s (t),c (t) B, () where B = {, }. The vector, x(t) =(,,,,...,,,, ), represents the state of the system that is shown in Fig. 8, i.e., there is a customer waiting on the first floor (c =), but no server present (s =) etc. A state transition table is use to moel the ynamics of the system. The state evolution is sequential (S ring stans for sequential ring ), scanning the sites from own to, an then again aroun from. The up an own

48 T. Bartz-Beielstein, K.E. Parsopoulos, an M.N. Vrahatis: Design an Analysis of Optimization Algorithms Fig. 7 Visualization of the ynamics in an elevator system. Fujitec s elevator simulator representing the fine moel. Six elevator cars are serving 5 floors. This moel is computationally expensive an has a high accuracy [57]. f-th floor c f- s f- Server # Customer Customer Server # n floor Customer Server #3 Customer c s c - s - st floor c s Fig. 8 The S ring as an elevator system. Three cars are serving 6 floors (or sites). The sites are numbere from to, an f enotes the floor number. There are two sites (up an own) for each floor. This is a coarse (surrogate) moel that is fast to solve, but less accurate. Results obtaine from this moel shoul be transferable to other systems [58].

Appl. Num. Anal. Comp. Math., No. 3 (4) / www.anacm.org 49 elevator movements can be consiere as a loop. This motivates the ring structure. At each time step, one of the floor queues is consiere, where passengers may arrive with a specific probability. Consier the situation at the thir site (the up irection on the thir floor) in Fig. 8. Since a customer is waiting, an a server is present, the controller has to make a ecision. The elevator car can either serve the customer ( take ecision) or it can ignore the customer ( pass ecision). The former ecision woul change the values of the corresponing bits of x(t), from (, ) to (, ), while the latter from (, ) to (, ). The rules of operation of this moel are very simple, thus, it is easily reproucible an suitable for benchmark testing. Despite the moel s simplicity, it is har to fin the optimal policy, π, even for a small S ring. The true π is not obvious, an its ifference from heuristic suboptimal policies is non trivial. So far, the S ring has been escribe as a simulation moel. To use it as an optimization problem, it is equippe with an objective function. Consier the function that counts the sites with waiting customers at time t, Q(t) =Q(x, t) = c i (t). i= Then, the steay state, time average number of sites with waiting customers in queue is, Q = lim T T Q(t)t T with probability. (3) The basic optimal control problem is to fin a policy, π, for a given S ring configuration, such that the expecte number, Q, of sites with waiting passengers that is the steay state, time average, as efine in Eq. (3), in the system is minimize, i.e., π = arg min π Q(π). (4) A -imensional vector, y R, can be use to represent the policy. Let, θ : R B, efine the Heavisie function, θ(z) = {, z <, z, (5) x = x(t) be the state at time t as efine by Eq. (), an y R be a weight vector. A linear iscriminator, or perceptron, π(x, y) =θ ( x, y ), (6) can be use to moel the ecision process in a compact manner, where, x, y = i x i y i. For a given vector, y, that represents the policy, an a given vector, x, that represents the state of the system, a take ecision occurs if π(x, y), otherwise the elevator will ignore the customer. The most obvious heuristic policy is the greey one, i.e., when given the choice, always serve the customer. The imensional vector, y =(,,...,), can be use to represent the greey policy. This vector guarantees that the result in Eq. (6) equals, which is interprete as a take ecision. Rather counter intuitively, this policy is not optimal, except in the case of heavy traffic. This means that a goo policy must bypass some customers occasionally to prevent a phenomenon that is known as bunching, which occurs in elevator systems when nearly all elevator cars are positione in close proximity to one other. The perceptron S ring problem can serve as a benchmark problem for many optimization algorithms, since it relies on a fitness function, which maps R to R [58, 6]. In general, π can be realize as a look up table of the system state, x, an π is foun by enumerating all possible π an selecting the one with the lowest value of the function Q. Since this count grows exponentially with, the enumerative approach woul not work for any but the smallest cases.