Advanced Topics in Image Analysis and Machine Learning Introduction to Genetic Algorithms Week 3 Faculty of Information Science and Engineering Ritsumeikan University
Today s class outline Genetic Algorithms Introduction to Genetic Algorithms Image Restoration Project Introduction
Genetic Algorithm (GA) OVERVIEW A class of probabilistic optimisation algorithms Inspired by the biological evolution process Uses concepts of Natural Selection and Genetic Inheritance (Darwin 1859) Originally developed by John Holland (1975) Special Features: Traditionally emphasizes combining information from good parents (crossover) There are many GA variants, e.g., reproduction models, operators
GA overview (cont) Particularly well suited for hard problems where little is known about the underlying search space Widely-used in business, science and engineering Holland s original GA is now known as the simple genetic algorithm (SGA). Other GAs use different: Representations Mutations Crossovers Selection mechanisms
GA's are useful for solving multidimensional problems containing many local maxima (or minima) in the solution space Function Optimisation A real-world problem A simple optimisation problem (no need to use a GA to solve this!) global local
A standard method of finding maxima or minima is via the gradient decent (gradient ascent) method global local I found the top! Problem: this method may find only a local maxima!
Genetic Algorithm: the Idea My height is 10.5m My height is 13.2m My height is 3.6m My height is 7.5m The Genetic Algorithm uses multiple climbers in parallel to find the global optimum
Genetic algorithm some iterations later A climber has approached the global maximum I found the top!
GA Stochastic operators Selection replicates the most successful solutions found in a population at a rate proportional to their relative quality Crossover takes two distinct solutions and then randomly mixes their parts to form novel solutions Mutation randomly perturbs (changes, agitates) a candidate solution
The Metaphor Genetic Algorithm Optimization problem Feasible solutions Solutions quality (fitness function) Environment Nature Individuals living in that environment Individual s degree of adaptation to its surrounding environment
The Metaphor (cont) Genetic Algorithm A set of feasible solutions Stochastic operators Iteratively applying a set of stochastic operators on a set of feasible solutions Nature A population of organisms (species) Selection, recombination and mutation in nature s evolutionary process Evolution of populations to suit their environment
Simple Gene4c Algorithm 1. produce an initial population of individuals (parents) 2. evaluate the fitness of all parents 3. while termination condition not met do 1. select fitter parents for reproduction evaluate the fitness of each parent 2. recombine between fit parents to make offspring 3. mutate offspring 4. Replace the whole population with the resulting offspring end while 4. output best offspring (highest fitness)
The Evolutionary Cycle selection fittest parents modification initiate & evaluate population parents evaluated strong offspring evaluation modified offspring deleted members discard
GA Example: the MAXONE problem Suppose we want to maximise the number of ones in a string of 10 binary digits A gene can be encoded as a string of 10 binary digits, e.g., 0010110101 The fitness f of a candidate solution to the MAXONE problem is the number of ones in its genetic code, e.g. f(0010110101) = 6 We start with a population of n random strings. Suppose that n = 6
Example (initialisation) Our initial population of parent genes is made using random binary data: s 1 = 1111010101 f (s 1 ) = 7 s 2 = 0111000101 f (s 2 ) = 5 s 3 = 1110110101 f (s 3 ) = 7 s 4 = 0100010011 f (s 4 ) = 4 s 5 = 1110111101 f (s 5 ) = 8 s 6 = 0100110000 f (s 6 ) = 3 The fitness f of a parent gene is simply the sum of the bits.
Selection Selection is an operation that is used to choose the best parent genes from the current population for breeding a new child population Purpose: to focus the search in promising regions of the solution space
Example (Selection) Next we apply fitness proportionate selection with the roulette wheel method: We repeat the extraction as many times as the number of individuals we need to have the same parent population size (6 in our case) Individual i will have a probability to be chosen n 1 4 2 3 i f ( i) f ( i) Area is Proportional to fitness value
Example (selection continued) Suppose that, after performing selection, we get the following population: s 1` = 1111010101 (s 1 ) Selected parents s` s 2` = 1110110101 (s 3 ) s 3` = 1110111101 (s 5 ) s 4` = 0111000101 (s 2 ) s 5` = 0100010011 (s 4 ) Original parents (s) s 6` = 1110111101 (s 5 )
Example (crossover) Next we mate parent strings using crossover. For each pair of parents we decide according to a crossover probability (for instance 0.6) whether to actually perform crossover or not. Suppose that we decide to actually perform crossover only for pairs (s 1`, s 2`) and (s 5`, s 6`). For each pair, we randomly choose a crossover point, for instance bit 2 for the first and bit 5 for the second parent
Example (crossover cont.) Before crossover: s 1` = 1111010101 s 2` = 1110110101 s 5` = 0100010011 s 6` = 1110111101 After crossover: s 1`` = 1110110101 s 2`` = 1111010101 s 5`` = 0100011101 s 6`` = 1110110011 Note: sometimes crossover results in no changes to the pair!
Example (mutation) The final step is to apply random mutation: for each bit in the current gene population we allow a small probability of mutation (for instance 0.05) Before applying mutation: After applying mutation: Fitness: s 1`` = 1110110101 s1``` = 1110100101 f (s1``` ) = 6 s 2`` = 1111010101 s2``` = 1111110100 f (s2``` ) = 7 s 3`` = 1110111101 s3``` = 1110101111 f (s3``` ) = 8 s 4`` = 0111000101 s4``` = 0111000101 f (s4``` ) = 5 s 5`` = 0100011101 s5``` = 0100011101 f (s5``` ) = 5 s 6`` = 1110110011 s6``` = 1110110001 f (s6``` ) = 6 Purpose: mutation adds new information that may be missing from the current population
Example: Results In one generation, the total population fitness changed from 34 to 37, thus improved by ~9% At this point, we go through the same process all over again (repetition), until a stopping criterion is met
Another example Maximise X 2 Simple problem: maximise y=x 2 over the x interval {0,1,,31} GA approach: Representation: binary code, e.g. 01101 13 (10 Population size: 4 genes (parents) Random initialisation Roulette wheel selection 1-point crossover, bit-wise mutation We will show one generational cycle as an example
x 2 example: selection Make sure you understand this slide! You will implement something similar during your image restoration coding project! Prob i calculation for gene S 1 : Prob(169) = 169/1170 = 0.144 Expected count(s 1 ) = Prob i * n = 0.14 * 4 = 0.58
x 2 example: crossover Each pair of genes may undergo crossover. The crossover points are randomly selected. Notice that, after crossover, the average population fitness increased from 293 to 439, and the best genes fitness increased from 576 to 729!
x 2 example: mutation All gene bits may undergo mutation (based on the mutation rate). Notice that, after mutation, the average population fitness increased from 439 to 588(the best genes fitness did not change though)!
GA Group Projects Today we will form teams of several students; Each team will implement a GA in Matlab (or C/Java/VB?) to restore a corrupted image: Each team should have one good programmer, and access to a notebook computer (preferably with Matlab)! You will submit a written report in week 14 and give a short presentation in week 15 (in English)
GA Group Project: details The form of the corruption source is additive noise: N(row,col)= NoiseAmp sin([2π NoiseFreqRow row]+[2π NoiseFreqCol col])) Teams must code a simple GA that optimises the three unknown constants NoiseAmp, NoiseFreqRow, and NoiseFreqCol such that the restoration error (the difference between the original and GA-optimised restored image) is minimised. To make things easy, we will measure the average per-pixel restoration error, thus: Restoration error = (Ioriginal + Noise GA )-Icorrupted where Ioriginal is the original uncorrupted Lena image, Icorrupted is the corrupted image (I will give you), and Noise GA is the modelled GA corruption noise using the noise equation above.
GA Group Project: details Each iteration of your GA will, for each gene in the population: Generate new values for NoiseAmp, NoiseFreqRow, and NoiseFreqCol. Corrupt the original image using the equation N(row,col)=NoiseAmp sin([2π NoiseFreqRow row]+[2π NoiseFreqCol col])) Measure the restoration error (subtract the GA corrupted image from the original corrupted image). This becomes the (inverse of) this gene s fitness Make new child genes using selection, crossover, and mutation functions. The search ranges for the three variables are: NoiseAmp 0 to 30.0 NoiseFreqRow 0 to 0.01 NoiseFreqCol 0 to 0.01 Each gene encodes all three variables. If you use 1 byte per variable, each gene will be 24-bits, if you use 2-bytes per variable, 48 bits: 10110111 01010001 11001010 (24-bits per gene) NoiseAmp NoiseFreqRow NoiseFreqCol You need to map the (binary) integer values of each gene to floating point values for the variables. I.e, for NoiseAmp, 00000000=0.0 and 11111111=30.0
Next Lecture We will learn more about Genetic Algorithms (GAs) We will discuss the image restoration project. Read: Gonzalez and Woods Access to the course website: http://www.ritsumei.ac.jp/~gulliver/iaml
Homework: Project Preparation Start coding your GA. User inputs are population size (integer, e.g., 50), crossover rate (%, integer, e.g. 60), mutation rate (%, integer, e.g. 5), and total iterations (integer, e.g. 100). Make arrays to hold the gene binary values Fill the arrays with random binary data Map the gene s binary values to the three noise parameters values (floating point) Using the equation N(row,col)=NoiseAmp*sin([2π* NoiseFreqRow*row]+[2π*NoiseFreqCol*col])) calculate the corruption noise for each pixel of the image. Remember, the noise values can be negative, so use signed data types.