Genetic Programming Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 1
Genetic programming The idea originated in the 1950s (e.g., Alan Turing) Popularized by J.R. Koza in the 1990s John Koza, Stanford University, US Computationally very expensive, but, with the increasing CPU powers, slowly also becomes applicable for complex problems Using the paradigm of Evolutionary Computation to breed computer programs that can perform certain pre-defined tasks Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 2
The challenge How can computers learn to solve problems without being explicitly programmed? In other words, how can computers be made to do what is needed to be done, without being told exactly how to do it? Attributed to Arthur Samuel (1959) Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 3
Genetic Programming in a nutshell Population of programs Creation of offspring programs from parents Evaluation by testing and selection of parents proportional to fitness Requires: Evolution Cycle Program representation Initial population generation Genetic operators Fitness calculation Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 4
Evolution Cycle Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 5
Genetic Programming flowchart *flowchart from Koza Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 6
Programs and representation Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 7
A computer program input program output potential potential potential subroutines loops recursions potential Internal storage Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 8
Representation The first step in building a EA is to find a representation of the problem For GP, the problem is to find a representation for computer programs that is open to the standard genetic operators Program trees / parse trees make good representations for GP Program tree = parse tree à LISP Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 9
Program structure A program tree consists of functions, arguments, and terminals Program trees are created by selecting from a fixed set of functions and a fixed set of terminals Program trees can be composed recursively A function can be the argument to another function Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 10
Program trees are widely applicable Parse trees are widely applicable With a terminal set and a function set we can represent: Arithmetic formulas Logical formulas æ y ö Computer programs 2 p + ç( x + 3) - è 5 + 1 ø i =1; while (i < 20) { i = i +1 } (x Ù true) (( x Ú y ) Ú (z «(x Ù y))) Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 11
Example: Arithmetic formula tree æ 2 p + ç( x è + 3) - y ö 5 + 1ø ( + (. 2 π ) ( - ( + x 3 ) ( / y ( + 5 1 ) ) ) ) Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 12
Example: Logical formula tree (x Ù true) (( x Ú y ) Ú (z «(x Ù y))) (à ( Ù x true ) ( Ú ( Ú x y ) ( «z ( Ù x y ) ) ) ) Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 13
Example: Program tree i =1; while (i < 20) { i = i +1 } ( ; ( = i 1 ) ( while ( < i 20 ) ( = i ( + i 1 ) ) ) ) Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 14
Representation issues Three issues for the choice of representation of programs in Genetic Programming: Closure: the function set should be well defined for any combination of arguments Sufficiency: the function and terminal set must be able to produce a solution Universality: the function and terminal sets should be larger than the minimum required for sufficiency Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 15
Closure Requirement: each function in the function set should be well defined for any combination of arguments That is, each function must be able to accept as input any value and data type that may possibly be returned by any function and terminal Example: {AND, OR, NOT} with {T, NULL} is closed {+,-,/,*} with {X,Y} where X and Y are real numbers is not closed because division by 0 is not acceptable Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 16
Adding Closure Closure can usually be achieved easily A few examples of special cases and common fixes: Square root of a negative number Logarithm of 0 Division by zero: define a protected division function which returns 1 when division by 0 is attempted Define special functions which modify unacceptable input conditions Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 17
Sufficiency There should be sufficient believe that there are compositions of functions that can yield a solution to the problem In some domains the requirements are well known, in others they are not so, ultimately, the user must find a set which works Example: The function set {+,-} is never sufficient for describing expressions using multiplicator and/or division operator The erf function (Gaussian error function) might be a necessary function for statistics computations. Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 18
Universality A closed, sufficient function/terminal set will lead to a solution However, the addition of extraneous functions may either degrade or improve the performance of the GP system In practice, it seems that a few extra functions seem to improve the performance and the range of application of a GP system Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 19
Initialization Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 20
Initial program creation Pick a random function from the function set F to be the root node of the tree Every function has a fixed number of arguments (unary, binary,. ); for each argument, create a node from either the function set F or the terminal set T If a terminal is selected then this becomes a leaf If a function is selected, then expand this function recursively A maximum depth is used to make sure the process stops Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 21
Three methods for program creation Maximum initial depth of trees D max is set Full method (each branch has depth = D max ): nodes at depth d < D max randomly chosen from function set F nodes at depth d = D max randomly chosen from terminal set T Grow method (each branch has depth D max ): nodes at depth d < D max randomly chosen from F È T nodes at depth d = D max randomly chosen from T Common GP initialisation: ramped half-and-half, where the grow & full method each deliver half of initial population Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 22
Genetic Operators Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 23
Mutation types for trees Mutation Operators Applied in Tree-based GP Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 24
Point Mutation + + * * * + - - * - - - * - x 1 x 1 - - x 1 x 1 x 1 - - x 1 x 1 x 1 x 1 x 1 Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 25
Permutation + + * * * + - - * - - - * - x 1 x 1 - - x 1 x 1 x 1 - - x 1 x 1 x 1 x 1 1 x Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 26
Hoist + * * * - - * - - - x 1 x 1 - - x 1 x 1 x 1 x 1 x 1 Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 27
Expansion Mutation + + * * * * - - * - - - * - x 1 x 1 - - x 1 x * x 1 - - x 1 x 1 x 1 - - x 1 x 1 x 1 x 1 Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 28
Collapse Subtree Mutation + + * * * * - - * - - - x - x 1 x 1 - - x 1 x 1 x 1 x 1 x 1 x 1 Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 29
Subtree Mutation + + * * - * - - * - x 1 * - x 1 x 1 - - x 1 - - x 1 x 1 x 1 x 1 x 1 Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 30
Mutation Mutation has two parameters: Probability p m to choose mutation vs. recombination Probability to choose an internal point as the root of the subtree to be replaced Remarkably p m is advised to be 0 (Koza 92) or very small, like 0.05 (Banzhaf et al. 98) The size of the child can exceed the size of the parent Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 31
Crossover types for trees Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 32
Subtree Exchange Crossover Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 33
Selfcrossover Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 34
Module Crossover Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 35
Recombination Recombination has two parameters: Probability p c to choose recombination versus mutation Probability to choose an internal point within each parent as crossover point The size of offspring can exceed that of the parents Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 36
Selection Typically: fitness proportional parent selection Greedy over-selection in very large populations rank population by fitness and divide it into two groups: group 1: best x% of population, group 2 other (100-x)% 80% of selection operations chooses from group 1, 20% from group 2 for pop. size = 1000, 2000, 4000, 8000 x = 32%, 16%, 8%, 4% motivation: to increase efficiency, % s come from rule of thumb Sometimes also: survivor selection: Typical: generational scheme (thus (μ,λ)- or (μ+λ)-selection) Recently steady-state is becoming popular for its elitism Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 37
Editing Operator An operator that provides a way to simplify expressions as the evolution process is running The editing operator recursively applies a pre-established set of domain-independent and domain-specific editing rules preserving the context Examples: (AND X X) à X (OR X X) à X (NOT (NOT X)) à X Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 38
Fitness calculation Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 39
Program fitness The fitness of a program is calculated as the error over a set of fitness cases obtained from a number of runs with different inputs In most cases, the set of possible inputs is infinite, so the fitness should be obtained runs on a subset of all possible inputs Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 40
Application examples Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 41
Symbolic regression of sin(x) Objective: Find a computer program with one input (x), whose output equals the value of sin(x) in range from 0 to 9 rad (0.0 < x < 172) Terminal set: T = {X, Constants} Fitness: Error for x = 0, 1,, 9 Termination: Termination after 31 generations Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 42
Symbolic regression of sin(x) Results using different function sets: Function Sets Result Generation Error (final) F 1 : { +, -, *, /, sin } sin(x) 0 0.00 F 2 : { +, -, *, /, cos } cos(x + 4.66) 12 0.40 F 3 : { +, -, *, / } -0.32 x 2 + x 29 1.36 Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 43
Symbolic regression of sin(x) GP for sin(x) 1,2 1 0,8 function value 0,6 0,4 sin(x) cos(x+4.66) -0.32x^2+x 0,2 0 0 0,2 0,4 0,6 0,8 1 1,2 1,4 1,6 1,8 2 2,2 2,4 2,6 2,8 3-0,2 x in rad Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 44
Controllers for Autonomous Robots Transportation of an object to the goal (light) Cooperation of two robots Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 45
Setup of GP for Robot Control Function set: F = { IF_OBJ, IF_GOAL, IF_FORWARD, IF_OBS1 } Terminal set: T = { MOVE_FORWARD, MOVE_FORWARD & TURN_LEFT, MOVE_FORWARD & TURN_RIGHT, MOVE_BACKWARD,TURN_LEFT, TURN_RIGHT, RANDOM } Fitness function: (1) (2) Fnew = Fold + w1 (# collisions ) + w2 (# steps) F = F + w (# miss) + w (# steps + vision w new old 1 2 ) 3 Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 46
Experimental Results: Best Fitness Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 47
Experimental Results: Average Fitness Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 48
Experimental Results: Hits Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 49
Practicalities Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 50
Bloat Bloat: redundant parts of trees that does not contribute to fitness; Observed effect: survival of the fattest, i.e., the tree sizes in the population are increasing over time Ongoing research and debate about the reasons Needs countermeasures, e.g., Prohibiting variation operators to grow trees beyond maximum size Parsimony pressure: penalty for size Why counteract? Additional structure can be adaptation for future changes (epigenetics, pre-adaptations) However, this argument seems mainly important for evolution in dynamically changing environments. Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 51
Performance Acceptable performance at acceptable costs on a wide range of problems Intrinsic parallelism (robustness, fault tolerance) Superior to other techniques on complex problems with lots of data, many free parameters complex relationships between parameters many (local) optima Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 52
Other Genetic Programming variants Linear Genetic Programming Programs are represented are sequences of imperative instructions. Three types: stack-based, register-based, machine-code Grammatical evolution Evolving solutions according to a user-specified grammar (usually in BNF) Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 53
Advantages No presumptions w.r.t. problem space Widely applicable but no free lunch!!! Easy to incorporate other methods Solutions are interpretable (unlike NN) Provide many alternative solutions Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 54
Disadvantages No guarantee for optimal solution within finite time Weak theoretical basis May need parameter tuning Often computationally expensive, i.e., slow Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 55
Open ended evolution? Can we evolve programs that work but that we do not understand anymore? This could evolve robots or internet bots that we might be unable to control. Open ended evolution; still the fitness would be determined by the human programmer. Some researchers claim that open ended evolution would occur if programs evaluate each other, if they coevolve. Example: Chess computer by Kantschik. But can programs invent new games? http://www.tim-taylor.com/papers/thesis/html/node39.html Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 56
Dangerous toys or useful tools? Worst case scenario described by Stephen Hawking: Nanorobots, fighting against cancer. It (AI) would take off on its own, and re-design itself at an ever increasing rate," he said, "Humans, who are limited by slow biological evolution, couldn't compete, and would be superseded. More optimistic view would be to use evolution to solve very difficult problems, such as finding potent drugs, clean energy production, and conflict resolution strategies. Stanley Kubrick's film 2001 and its murderous computer HAL Stephen Hawking warns artificial intelligence could end mankind http://www.bbc.com/news/technology-30290540 Prof. Thomas Bäck Natural Evolutionary Computing Algorithms Group Genetic Programming 57