Suppose you have a problem You don t know how to solve it What can you do? Can you use a computer to somehow find a solution for you?

Gurjit Randhawa

Suppose you have a problem You don t know how to solve it What can you do? Can you use a computer to somehow find a solution for you? This would be nice! Can it be done?

A blind generate and test algorithm: Repeat Generate a random possible solution Test the solution and see how good it is Until solution is good enough

Generate a set of random solutions Repeat Test each solution in the set (rank them) Remove some bad solutions from set Duplicate some good solutions Make small changes to some of them Until best solution is good enough

After scientists became disillusioned with classical and neoclassical attempts at modeling intelligence, they looked in other directions. Two prominent fields arose, connectionism (neural networking, parallel processing) and evolutionary computing. Developed: in the 1970 s (USA ) Early names: J. Holland, K. DeJong, D. Goldberg Typically applied to: Optimization Problems Special Features: Traditionally emphasizes combining information from good parents (crossover) Many variants, e.g., reproduction models, operators

A Genetic Algorithm (or GA) is a search technique used in computing to find true or approximate solutions to optimization and search problems. Genetic algorithms are categorized as global search heuristics. Genetic algorithms are a particular class of evolutionary algorithms that use techniques inspired by evolutionary biology such as inheritance, mutation, selection, and crossover. Genetic algorithms are search and optimization algorithms based on the mechanics of natural selection and natural genetics [David E. Goldberg].

Individual - Any possible solution Population - Group of all individuals Search Space - All possible solutions to the problem Chromosome - Blueprint for an individual Allele - Possible settings of trait Locus - The position of a gene on the chromosome

The evolution usually starts from a population of randomly generated individuals and happens in generations. In each generation, the fitness of every individual in the population is evaluated, multiple individuals are selected from the current population (based on their fitness), and modified (recombined and possibly mutated) to form a new population

The new population is then used in the next iteration of the algorithm. Commonly, the algorithm terminates when either maximum number of generations has been produced, or a satisfactory fitness level has been reached for the population. If the algorithm has terminated due to a maximum number of generations, a satisfactory solution may or may not have been reached.

Holland s Original GA is known as Simple Genetic Algorithm(SGA), which use following parameters. Other variants use different parameter combinations. Representation Crossover Mutation Parent selection Survivor selection Speciality Binary strings N-point Bitwise bit-flipping with fixed probability Fitness-Proportionate All children replace parents Emphasis on crossover

1. Select parents for the mating pool (size of mating pool = population size) 2. Shuffle the mating pool 3. For each consecutive pair apply crossover with probability p c, otherwise copy parents 4. For each offspring apply mutation (bit-flip with probability p m independently for each bit) 5. Replace the whole population with the resulting offspring

Has been subject of many (early) studies still often used as benchmark for novel GAs Shows many shortcomings, e.g. Representation is too restrictive Mutation & crossovers only applicable for bit-string & integer representations Selection mechanism sensitive for converging populations with close fitness values Generational population model can be improved with explicit survivor selection

Binary Integers Floating point variables Character Representation String Representation Hybrid Representation

Suitable for many problems e.g. 0/1 Knapsack Problem etc. 01010101110 Where each bit represent different object 0 represent inclusion of object and 1 represent discarding an object or vice versa Objective is to maximize profit keeping capacity constraint in mind.

Some problems naturally have integer variables, e.g. process/job scheduling N-point / uniform crossover operators work Extend bit-flipping mutation to make creep i.e. more likely to move to similar value Random choice

Many problems occur as real valued problems, e.g. knapsack problem 0.3 1.0 1.0 0.0 Here fraction values are also allowed.

Suitable for problems where solutions can be represented in form of Character Strings e.g. Prisoners Dilemma cddcsd where c represent Cooperate and d represent Deny

Categorical Representation of solutions e.g. pink red pink blue

Best suitable alternative for multi objective functions Multiple constraints can be shown together e.g. in Protein Engineering, Medical Sciences, Biology A0.3 B0.2

Roulette Wheel Selection Stochastic Universal Sampling Sigma Scaling Keeps selection pressure (degree to which highly fit individuals are allowed many offspring) constant throughout the run An individual's expected value is a function of its fitness Rank Selection

Tournament Selection Boltzmann Selection similar to simulated annealing the ratio of expected values of individuals ranked i and i+1 will be the same whether their absolute fitness differences are high or low Elitism Steady-State Selection

Main idea: better individuals get higher chance Chances proportional to fitness Implementation: roulette wheel technique» Assign to each individual a part of the roulette wheel» Spin the wheel n times to select n individuals A 3/6 = 50% 1/6 = 17% B C 2/6 = 33% fitness(a) = 3 fitness(b) = 1 fitness(c) = 2

Performance with 1 Point Crossover depends on the order that variables occur in the representation more likely to keep together genes that are near each other Can never keep together genes from opposite ends of string This is known as Positional Bias Can be exploited if we know about the structure of our problem, but this is not usually the case

Choose a random point on the two parents Split parents at this crossover point Create children by exchanging tails P c typically in range (0.6, 0.9)

Choose n random crossover points Split along those points Glue parts, alternating between parents Generalisation of 1 point (still some positional bias)

Assign 'heads' to one parent, 'tails' to the other Flip a coin for each gene of the first child Make an inverse copy of the gene for the second child Inheritance is independent of position

Alter each gene independently with a probability p m p m is called the mutation rate and can be chosen based upon different conditions e.g. 1/pop_size or 1/ chromosome_length Typically In Nature : 0.0001 Practical Implementation: 0.4-0.6

Decade long debate: which one is better / necessary / main-background Answer (at least, rather wide agreement): it depends on the problem, but in general, it is good to have both

Exploration: Discovering promising areas in the search space, i.e. gaining information on the problem Exploitation: Optimising within a promising area, i.e. using information There is co-operation AND competition between them Crossover is explorative, it makes a big jump to an area somewhere in between two (parent) areas Mutation is exploitative, it creates random small diversions, thereby staying near (in the area of ) the parent

Only crossover can combine information from two parents Only mutation can introduce new information (alleles) Crossover does not change the allele frequencies of the population To hit the optimum you often need a lucky mutation

Simple problem: Max x 2 over {0,1,,31} GA approach: Representation: binary code, e.g. 01101 13 Population size: 4 Single Point Crossover, Bitwise Mutation Roulette Wheel Selection Random initialization We show one generational cycle done by hand

GA s are algorithms inspired from nature Very useful in search and optimization problems Can reduce computation time to great extent Does not guarantee optimal solution, but may provide optimal or near optimal solution quite efficiently Good for handling optimization problems with large data.