LECTURE 16: SWARM INTELLIGENCE 2 / PARTICLE SWARM OPTIMIZATION 2

15-382 COLLECTIVE INTELLIGENCE - S18 LECTURE 16: SWARM INTELLIGENCE 2 / PARTICLE SWARM OPTIMIZATION 2 INSTRUCTOR: GIANNI A. DI CARO

BACKGROUND: REYNOLDS BOIDS Reynolds created a model of coordinated animal motion in which the agents (boids) obeyed three simple local rules: Separation: steer to avoid crowding local flockmates Alignment: steer towards the average heading of local flockmates Cohesion: steer to move toward the average position of local flockmates Field of view Reynolds, C.W.: Flocks, herds and schools: a distributed behavioral model. Computer Graphics, 21(4), p.25-34, 1987 https://www.youtube.com/watch?v=qbupfmxxqiy Play with Boids, PSO, CA, and with NetLogo https://ccl.northwestern.edu/netlogo/ 2

BOIDS + ROOSTING BEHAVIOR Kennedy and Eberhart included a roost, or, more generally, an attraction point (e.g, a prey) in a simplified Boids-like simulation, such that each agent: is attracted to the location of the roost, remembers where it was closer to the roost, shares information with its neighbors about its closest location to the roost Eventually, (almost) all agents land on the roost What if: roost = (unknown) extremum of a function distance to the roost = quality of current agent position on the optimization landscape 3

PARTICLE SWARM OPTIMIZATION (PSO) PSO consists of a swarm of bird-like particles Multi-agent system At each discrete time step, each particle is found at a position in the search space: it encodes a solution point x Χ n R n for an optimization problem with objective function f(x): Χ n R m (black-box or white-box problem) The fitness of each particle represents the quality of its position on the optimization landscape (note: this can be virtually anything, think about robots looking for energy) Particles move over the search space with a certain velocity Each particle has: Internal state + (Neighborhood Network of social connections) At each time step, the velocity (both direction and speed) of each particle is influenced + random, by: pbest: its own best position found so far lbest: the best solution that was found so far by the teammates in its social neighbor, and/or gbest: the global best solution so far Eventually the swarm will converge to optimal positions {~x, ~v,~x pbest, N(p)} 4

NEIGHBORHOODS Geographical Social Global 5

VECTOR COMBINATION OF MULTIPLE BIASES ~v t r 2 (~x lbest ~x t ) r 1 (~x pbest ~x t )!~v t ~x t+1 ~x lbest ~x t ~x lbest ~x t ~x pbest ~x t ~x pbest 6

PARTICLE SWARM OPTIMIZATION (PSO) element-wise multiplication operator ~r 1 = U(0, 1) ~r 2 = U(0, 2) ɸ are acceleration coefficients determining scale of forces in the direction of individual and social biases 7

VECTOR COMBINATION OF MULTIPLE BIASES 1. Inertia 2. Personal Influence 3. Social Influence Makes the particle move in the same direction and with the same velocity Improves the individual Makes the particle return to a previous position, better than the current Conservative Makes the particle follow the best neighbors direction Search for new solutions Exploits what good so far 8

PSO AT WORK (MAX OPTIMIZATION PROBLEM) Example slides from Pinto et al. 9

PSO AT WORK 10

PSO AT WORK 11

PSO AT WORK 12

PSO AT WORK 13

PSO AT WORK 14

PSO AT WORK 15

PSO AT WORK 16

PSO AT WORK 17

PSO AT WORK 18

GOOD AND BAD POINTS OF BASIC PSO Advantages Quite insensitive to scaling of design variables Simple implementation Easily parallelized for concurrent processing Derivative free / Black-box optimization Very few algorithm parameters Very efficient global search algorithm Disadvantages Tendency to a fast and premature convergence in mid optimum points Slow convergence in refined search stage (weak local search ability) 19

GOOD NEIGHBORHOOD TOPOLOGY? 20

GOOD NEIGHBORHOOD TOPOLOGY? Also considered were: Clustering topologies (islands) Dynamic topologies No clear way of saying which topology is the best Exploration / exploitation dilemma Some neighborhood topologies are better for local search others for global search lbest neighborhood topologies seems better for global search, gbest topologies seem better for local search 21

ACCELERATION COEFFICIENTS The boxes show the distribution of the random vectors of the attracting forces of the local best and global best The acceleration coefficients determine the scale distribution of the random individual / cognitive component vector φ 1 and the social component vector φ 2 22

ACCELERATION COEFFICIENTS ɸ 1 >0, ɸ 2 =0 particles are independent hill-climbers ɸ 1 =0, ɸ 2 >0 swarm is one stochastic hill-climber ɸ 1 =ɸ 2 >0 particles are attracted to the average of p i and g i ɸ 2 >ɸ 1 more beneficial for unimodal problems ɸ 1 >ɸ 2 more beneficial for multimodal problems low ɸ 1, ɸ 2 smooth particle trajectories high ɸ 1, ɸ 2 more acceleration, abrupt movements Adaptive acceleration coefficients have also been proposed, for example to have ɸ 1, ɸ 2 decreased over time (e.g., Simulated Annealing) 23

ORIGINAL PSO: ISSUES The acceleration coefficients should be set sufficiently high to cover large search spaces High acceleration coefficients result in less stable systems in which the velocity has a tendency to explode! To fix this, the velocity v is usually kept within the range [-v max, v max ] However, limiting the velocity does not necessarily prevent particles from leaving the search space, nor does it help to guarantee convergence :( 24

INERTIA COEFFICIENT The inertia weight ω was introduced to control the velocity explosion ~ ~ + ~r 1 ~ nd d + ~r 2 ~ soc If ω, ɸ 1, ɸ 2 are set correctly, this update rule allows for convergence without the use of v max The inertia weight can be used to control the balance between exploration and exploitation: ω 1: velocities increase over time, swarm diverges 0 < ω < 1: particles decelerate, convergence depends on ɸ 1, ɸ 2 25

CONSTRICTION COEFFICIENT Take away some guesswork for setting ω, ɸ 1, ɸ 2 The constriction coefficient is an elegant method for preventing explosion, ensuring convergence and eliminating the parameter v max The constriction coefficient was introduced as: 26

FULLY INFORMED PSO (FIPS) Best Average: Each particle is affected by all of its K neighbors by taking the average (personal best can or cannot be included) The velocity update in FIPS is: FIPS outperforms the canonical PSO s on most test-problems The performance of FIPS is generally more dependent on the neighborhood topology (global best neighborhood topology is recommended) 27

TYPICAL BENCHMARK FUNCTIONS 28

PERFORMANCE VARIANCE Minimization problems Best solution found over the iterations Improvement in performance differs according to the different strategic choices No a single winner Early stagnation of performance can exist 29

BINARY / DISCRETE PSO A simple modification to the continuous one Velocity remains continuous using the original update rule Positions are updated using the velocity as a probability threshold to determine whether the j-th component of the i-th particle is a zero or a one 30

ANALYSIS, GUARANTEES Hard because: - Stochastic search algorithm - Complex group dynamics - Performance depends on the search landscape Theoretical analysis has been done with simplified PSOs on simplified problems Graphical examinations of the trajectories of individual particles and their responses to variations in key parameters Empirical performance distributions 31

SUMMARY PSO Inspired by social and roosting behaviors in bird flocking Easy to implement, easy to get good results with wise parameter tuning (but just a few parameters) Computationally light Exploitation-Exploration dilemma A number of variants A few theoretical properties (hard to derive for general cases) Mostly applied to continuous function optimization, but also to combinatorial optimization, and robotics / distributed systems References: Swarm Intelligence, J. Kennedy, R. Eberhart, Y. Shi, Morgan Kaufmann, 2001 Computational Intelligence, A. Engelbrecht, Wiley, 2007 32