Extra Slides for lectures 1-3: Introduction to Evolutionary algorithms etc. The things in slides were more or less presented during the lectures, combined by TM from: A.E. Eiben and J.E. Smith, Introduction to Evolutionary Computing - slides, Chapters 1-9 1 Pseudo-code for typical EA 2 1
Typical behaviour of an EA Phases in optimising on a 1-dimensional fitness landscape Early phase: quasi-random population distribution Mid-phase: population arranged around/on hills 3 Late phase: population concentrated on high hills Typical run: progression of fitness Best fitness in population Time (number of generations) 4 Typical run of an EA shows so-called anytime behavior 2
Are long runs beneficial? Best fitness in population Progress in 2 nd half Progress in 1 st half 5 Time (number of generations) Answer: - it depends how much you want the last bit of progress - it may be better to do more shorter runs Is it worth expending effort on smart initialisation? Best fitness in population F T F: fitness after smart initialisation T: time needed to reach level F after random initialisation Time (number of generations) 6 Answer : it depends: - possibly, if good solutions/methods exist. - care is needed, see chapter on hybridisation 3
Example: on-line performance measure evaluation M.B.F. Algorithm A Algorithm B 7 Evaluations Which algorithm is better? -Why and when? Example: averaging on-line measures Run 1 average Run 2 time Averaging can choke interesting onformation 8 4
Example: overlaying on-line measures time Overlay of curves can lead to very cloudy figures 9 Representation 10 Phenotype space Genotype space = Encoding {0,1} L (representation) 10010001 10010010 010001001 011101001 Decoding (inverse representation) 5
SGA operators: 1-point crossover Choose a random point on the two parents Split parents at this crossover point Create children by exchanging tails P c typically in range (0.6, 0.9) 11 SGA operators: mutation Alter each gene independently with a probability p m p m is called the mutation rate Typically between 1/pop_size and 1/ chromosome_length 12 6
SGA operators: Selection A 3/6 = 50% Main idea: better individuals get higher chance Chances proportional to fitness Implementation: roulette wheel technique Assign to each individual a part of the roulette wheel Spin the wheel n times to select n individuals 1/6 = 17% B C 2/6 = 33% fitness(a) = 3 fitness(b) = 1 fitness(c) = 2 13 An example after Goldberg 89 (1) Simple problem: max x 2 over {0,1,,31} GA approach: Representation: binary code, e.g. 01101 13 Population size: 4 1-point xover, bitwise mutation Roulette wheel selection Random initialisation We show one generational cycle done by hand 14 7
Alternative Crossover Operators Performance with 1 Point Crossover depends on the order that variables occur in the representation more likely to keep together genes that are near each other Can never keep together genes from opposite ends of string This is known as Positional Bias Can be exploited if we know about the structure of our problem, but this is not usually the case 15 n-point crossover Choose n random crossover points Split along those points Glue parts, alternating between parents Generalisation of 1 point (still some positional bias) 16 8
Uniform crossover Assign 'heads' to one parent, 'tails' to the other Flip a coin for each gene of the first child Make an inverse copy of the gene for the second child Inheritance is independent of position 17 Other representations Gray coding of integers (still binary chromosomes) Gray coding is a mapping that means that small changes in the genotype cause small changes in the phenotype (unlike binary coding). Smoother genotype-phenotype mapping makes life easier for the GA Nowadays it is generally accepted that it is better to encode numerical variables directly as Integers Floating point variables 18 9
Integer representations Some problems naturally have integer variables, e.g. image processing parameters Others take categorical values from a fixed set e.g. {blue, green, yellow, pink} N-point / uniform crossover operators work Extend bit-flipping mutation to make creep i.e. more likely to move to similar value Random choice (esp. categorical variables) For ordinal problems, it is hard to know correct range for creep, so often use two mutation operators in tandem 19 Real valued problems Many problems occur as real valued problems, e.g. continuous parameter optimisation f : R n R Illustration: Ackley s function (often used in EC) 20 10
Mapping real values on bit strings z [x,y] R represented by {a 1,,a L } {0,1} L [x,y] {0,1} L must be invertible (one phenotype per genotype) Γ: {0,1} L [x,y] defines the representation L 1 y x j Γ( a1,..., al) = x + ( al j 2 ) [ x, y] L 2 1 = 0 j 21 Only 2 L values out of infinite are represented L determines possible maximum precision of solution High precision long chromosomes (slow evolution) Floating point mutations 1 General scheme of floating point mutations x = x i i,..., xl x = x1,..., x l 1 [ LB UB ] x, x, i i Uniform mutation: x drawn randomly (uniform) from LB, UB i [ ] Analogous to bit-flipping (binary) or random resetting (integers) i i 22 11
Floating point mutations 2 Non-uniform mutations: Many methods proposed,such as time-varying range of change etc. Most schemes are probabilistic but usually only make a small change to value Most common method is to add random deviate to each variable separately, taken from N(0, σ) Gaussian distribution and then curtail to range Standard deviation σ controls amount of change (2/3 of deviations will lie in range (- σ to + σ) 23 Crossover operators for real valued GAs 24 Discrete: each allele value in offspring z comes from one of its parents (x,y) with equal probability: z i = x i or y i Could use n-point or uniform Intermediate exploits idea of creating children between parents (hence a.k.a. arithmetic recombination) z i = α x i + (1 - α) y i where α : 0 α 1. The parameter α can be: constant: uniform arithmetical crossover variable (e.g. depend on the age of the population) picked at random every time 12
Single arithmetic crossover Parents: x 1,,x n and y 1,,y n Pick a single gene (k) at random, child 1 is: x 1,..., xk, α yk + (1 α ) reverse for other child. e.g. with α = 0.5 x k,..., x n 25 Simple arithmetic crossover Parents: x 1,,x n and y 1,,y n Pick random gene (k) after this point mix values child 1 is: x,..., x, α y x x k + (1 α),..., α y + (1 α 1 k + 1 k + 1 n ) n reverse for other child. e.g. with α = 0.5 26 13
Whole arithmetic crossover Most commonly used Parents: x 1,,x n and y 1,,y n child 1 is: a x + (1 a) y reverse for other child. e.g. with α = 0.5 27 Constrained problems Constraints g i (x) 0 for i = 1,,q inequality constraints h i (x) = 0 for i = q+1,,m equality constraints Can be handled by penalties: eval(x) = f(x) + W penalty(x) where penalty( x) = m = 1 1 0 j for violated constraint for satisfied constraint 28 14
Varying penalty: option 1 Replace the constant W by a function W(t) W ( t) = (C t) 0 t T is the current generation number α 29 Features: changes in W are independent from the search progress strong user control of W by the above formula W is fully predictable a given W acts on all individuals of the population Varying penalty: option 2 Replace the constant W by W(t) updated in each generation 30 β W(t) W ( t + 1) = γ W(t) W(t) if last k champions all feasible if last k champions all infeasible otherwise β < 1, γ > 1, β γ 1 champion: best of its generation Features: changes in W are based on feedback from the search progress some user control of W by the above formula W is not predictable a given W acts on all individuals of the population 15
Varying penalty: option 3 Assign a personal W to each individual Incorporate this W into the chromosome: (x 1,, x n, W) Apply variation operators to x i s and W Alert: 31 eval ((x, W)) = f (x) + W penalty(x) while for mutation step sizes we had eval ((x, σ)) = f (x) this option is thus sensitive cheating makes no sense Motivation 1: Multimodality Most interesting problems have more than one locally optimal solution. 32 16
Implicit 1: Island Model Parallel EAs EA EA EA EA EA 33 Periodic migration of individual solutions between populations Island Model EAs contd: Run multiple populations in parallel, in some kind of communication structure (usually a ring or a torus). After a (usually fixed) number of generations (an Epoch), exchange individuals with neighbours Repeat until ending criteria met Partially inspired by parallel/clustered systems 34 17
Island Model Parameters 1 Could use different operators in each island How often to exchange individuals? too quick and all pops converge to same solution too slow and waste time most authors use range~ 25-150 gens can do it adaptively (stop each pop when no improvement for (say) 25 generations) 35 Island Model Parameters 2 How many, which individuals to exchange? usually ~2-5, but depends on population size. more sub populations usually gives better results but there can be a critical mass i.e. minimum size of each sub population needed Martin et al found that better to exchange randomly selected individuals than best can select random/worst individuals to replace 36 18
Implicit 2: Diffusion Model Parallel EAs Impose spatial structure (usually grid) in 1 pop Current individual Neighbours 37 Diffusion Model EAs Consider each individual to exist on a point on a (usually rectangular toroid) grid Selection (hence recombination) and replacement happen using concept of a neighbourhood a.k.a. deme Leads to different parts of grid searching different parts of space, good solutions diffuse across grid over a number of gens 38 19
Diffusion Model Example Assume rectangular grid so each individual has 8 immediate neighbours equivalent of 1 generation is: pick point in pop at random pick one of its neighbours using roulette wheel crossover to produce 1 child, mutate replace individual if fitter circle through population until done 39 Implicit 3: Automatic Speciation Either only mate with genotypically/ phenotypically similar members or Add bits to problem representation that are initially randomly set subject to recombination and mutation when selecting partner for recombination, only pick members with a good match can also use tags to perform fitness sharing (see later) to try and distribute members amongst niches 40 20
Explicit 1: Fitness Sharing Restricts the number of individuals within a given niche by sharing their fitness, so as to allocate individuals to niches in proportion to the niche fitness need to set the size of the niche σ share in either genotype or phenotype space run EA as normal but after each gen set 41 f '( i) = µ j= 1 f ( i) sh( d( i, j)) 1 d / σ d < σ sh( d) = 0 otherwise Explicit 2: Crowding Attempts to distribute individuals evenly amongst niches relies on the assumption that offspring will tend to be close to parents uses a distance metric in ph/g enotype space randomly shuffle and pair parents, produce 2 offspring 2 parent/offspring tournaments - pair so that d(p1,o1)+d(p2,o2) < d(p1,02) + d(p2,o1) 42 21
Fitness Sharing vs. Crowding 43 Multi-Objective Problems (MOPs) Wide range of problems can be categorised by the presence of a number of n possibly conflicting objectives: buying a car: speed vs. price vs. reliability engineering design: lightness vs strength Two part problem: finding set of good solutions choice of best for particular application 44 22
MOPs 1: Conventional approaches rely on using a weighting of objective function values to give a single scalar objective function which can then be optimised: f '( x) = i= 1 to find other solutions have to re-optimise with different w i. n w i f i ( x) 45 MOPs 2: Dominance we say x dominates y if it is at least as good on all criteria and better on at least one f 1 x Pareto front Dominated by x 46 f 2 23
MOPs 3: Advantages of EC approach Population-based nature of search means you can simultaneously search for set of points approximating Pareto front Don t have to make guesses about which combinations of weights might be useful Makes no assumptions about shape of Pareto front - can be convex / discontinuous etc 47 MOPs 4: Requirements of EC approach Way of assigning fitness, usually based on dominance Preservation of diverse set of points similarities to multi-modal problems Remembering all the non-dominated points you ve seen usually using elitism or an archive 48 24
MOPs 5: Fitness Assignment Could use aggregating approach and change weights during evolution no guarantees Different parts of pop use different criteria e.g. VEGA, but no guarantee of diversity Dominance ranking or depth based fitness related to whole population 49 MOPs 6: Diversity Maintenance Usually done by niching techniques such as: fitness sharing adding amount to fitness based on inverse distance to nearest neighbour (minimisation) (adaptively) dividing search space into boxes and counting occupancy All rely on some distance metric in genotype / phenotype space 50 25
MOPs 7: Remembering Good Points Could just use elitist algorithm e.g. ( µ + λ ) replacement Common to maintain an archive of nondominated points some algorithms use this as second population that can be in recombination etc others divide archive into regions too e.g. PAES 51 26