A More Stable Approach To LISP Tree GP

Similar documents
Similarity Templates or Schemata. CS 571 Evolutionary Computation

Evolutionary Computation. Chao Lan

Genetic programming. Lecture Genetic Programming. LISP as a GP language. LISP structure. S-expressions

4/22/2014. Genetic Algorithms. Diwakar Yagyasen Department of Computer Science BBDNITM. Introduction

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/27/17

CHAPTER 4 GENETIC ALGORITHM

GENETIC ALGORITHM with Hands-On exercise

Mutations for Permutations

Artificial Intelligence

Table of Laplace Transforms

Machine Evolution. Machine Evolution. Let s look at. Machine Evolution. Machine Evolution. Machine Evolution. Machine Evolution

Bayesian Methods in Vision: MAP Estimation, MRFs, Optimization

Genetic Algorithms. PHY 604: Computational Methods in Physics and Astrophysics II

Suppose you have a problem You don t know how to solve it What can you do? Can you use a computer to somehow find a solution for you?

Automated Test Data Generation and Optimization Scheme Using Genetic Algorithm

CONCEPT FORMATION AND DECISION TREE INDUCTION USING THE GENETIC PROGRAMMING PARADIGM

Artificial Intelligence Application (Genetic Algorithm)

Genetic Programming of Autonomous Agents. Functional Description and Complete System Block Diagram. Scott O'Dell

Topological Machining Fixture Layout Synthesis Using Genetic Algorithms

Genetic Programming: A study on Computer Language

Genetic Programming. and its use for learning Concepts in Description Logics

The Genetic Algorithm for finding the maxima of single-variable functions

Introduction to Optimization

Distributed Algorithms 6.046J, Spring, Nancy Lynch

1. Introduction. 2. Motivation and Problem Definition. Volume 8 Issue 2, February Susmita Mohapatra

Chapter 14 Global Search Algorithms

Genetic Programming Prof. Thomas Bäck Nat Evur ol al ut ic o om nar put y Aling go rg it roup hms Genetic Programming 1

Welfare Navigation Using Genetic Algorithm

Evolutionary Computation Algorithms for Cryptanalysis: A Study

Introduction to Genetic Algorithms. Genetic Algorithms

Integer Programming ISE 418. Lecture 7. Dr. Ted Ralphs

Review: Final Exam CPSC Artificial Intelligence Michael M. Richter

A Comparative Study of Linear Encoding in Genetic Programming

Hierarchical Crossover in Genetic Algorithms

The Parallel Software Design Process. Parallel Software Design

Software Vulnerability

Introduction to Optimization

Applied Cloning Techniques for a Genetic Algorithm Used in Evolvable Hardware Design

DETERMINING MAXIMUM/MINIMUM VALUES FOR TWO- DIMENTIONAL MATHMATICLE FUNCTIONS USING RANDOM CREOSSOVER TECHNIQUES

Lecture 5. Treaps Find, insert, delete, split, and join in treaps Randomized search trees Randomized search tree time costs

MDL-based Genetic Programming for Object Detection

Introduction to Evolutionary Computation

Greedy Algorithms CHAPTER 16

Outline. Motivation. Introduction of GAs. Genetic Algorithm 9/7/2017. Motivation Genetic algorithms An illustrative example Hypothesis space search

Introduction to Counting, Some Basic Principles

An Interesting Way to Combine Numbers

Genetic Algorithm for Finding Shortest Path in a Network

15-451/651: Design & Analysis of Algorithms November 4, 2015 Lecture #18 last changed: November 22, 2015

An Adaption of the Schema Theorem to Various Crossover and Mutation Operators for a Music Segmentation Problem

Treaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19

Approximation Algorithms for Item Pricing

4 Fractional Dimension of Posets from Trees

Multi-objective Optimization

Introduction to Genetic Algorithms

Limitations of Algorithmic Solvability In this Chapter we investigate the power of algorithms to solve problems Some can be solved algorithmically and

introduction to Programming in C Department of Computer Science and Engineering Lecture No. #40 Recursion Linear Recursion

Evolutionary Algorithms

Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi.

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Dynamic Programming I Date: 10/6/16

Statistics Case Study 2000 M. J. Clancy and M. C. Linn

EECS 203 Spring 2016 Lecture 8 Page 1 of 6

Evolving Variable-Ordering Heuristics for Constrained Optimisation

Solving Traveling Salesman Problem Using Parallel Genetic. Algorithm and Simulated Annealing

1 Lab + Hwk 5: Particle Swarm Optimization

COMP 161 Lecture Notes 16 Analyzing Search and Sort

Automatic Generation of Prime Factorization Algorithms Using Genetic Programming

Discrete Mathematics Introduction

Evolutionary Computation Part 2

MITOCW watch?v=zm5mw5nkzjg

A Genetic Algorithm Implementation

Gen := 0. Create Initial Random Population. Termination Criterion Satisfied? Yes. Evaluate fitness of each individual in population.

Using Genetic Programming to Evolve a General Purpose Sorting Network for Comparable Data Sets

Genetic Algorithms and Genetic Programming. Lecture 9: (23/10/09)

AI Programming CS S-08 Local Search / Genetic Algorithms

Material from Recitation 1

Chapter 2 Basic Structure of High-Dimensional Spaces

March 19, Heuristics for Optimization. Outline. Problem formulation. Genetic algorithms

Using Genetic Algorithms to Solve the Box Stacking Problem

3 No-Wait Job Shops with Variable Processing Times

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

6.856 Randomized Algorithms

Algorithm Design (4) Metaheuristics

Inducing Parameters of a Decision Tree for Expert System Shell McESE by Genetic Algorithm

Some Applications of Graph Bandwidth to Constraint Satisfaction Problems

LP-Modelling. dr.ir. C.A.J. Hurkens Technische Universiteit Eindhoven. January 30, 2008

CSE100. Advanced Data Structures. Lecture 8. (Based on Paul Kube course materials)

Grid-Based Genetic Algorithm Approach to Colour Image Segmentation

Math 5320, 3/28/18 Worksheet 26: Ruler and compass constructions. 1. Use your ruler and compass to construct a line perpendicular to the line below:

ARTIFICIAL INTELLIGENCE (CSCU9YE ) LECTURE 5: EVOLUTIONARY ALGORITHMS

Throughout this course, we use the terms vertex and node interchangeably.

1 Lab 5: Particle Swarm Optimization

Investigating the Application of Genetic Programming to Function Approximation

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18

Reducing Graphic Conflict In Scale Reduced Maps Using A Genetic Algorithm

1 Leaffix Scan, Rootfix Scan, Tree Size, and Depth

Learning Composite Operators for Object Detection

(Refer Slide Time: 1:27)

Introduction to Genetic Algorithms. Based on Chapter 10 of Marsland Chapter 9 of Mitchell

Treewidth and graph minors

Transcription:

A More Stable Approach To LISP Tree GP Joseph Doliner August 15, 2008 Abstract In this paper we begin by familiarising ourselves with the basic concepts of Evolutionary Computing and how it can be used to hasten searches in large combinatorial optimization problems. We continue by looking at Holland s Theorem which address the problem of locality that makes scaling of genetic solutions difficult. We then look at how these concepts apply to Genetic Programming where the goal is evolution of computer programs to solve specific tasks. Finally I will present a novel solution that hopes to overcome the problem of locality. 1 Background Evolutionary Computing (EC) is a promising subfield of Artificial Intelligence which has only recently become an active topic of research. EC aims to apply the concepts of biological evolution and natural selection to combinatorial optimization problems. Since EC is a new field, exactly how one should apply the concepts of evolution is still debated. However most EC implementations do agree on a few things: the process begins with a random selection of solutions from the solution space S, often referred to as the primordial soup. This random selection forms the first generation, G 0. Each solution of the generation (called an organism) is then evaluated by a fitness function, f : S R, and assigned a fitness. The generation then procreates. Organisms are randomly selected (naturally selected) from the population with respect to a strictly increasing function of fitness to participate in simulated sexual reproduction in which offspring are created which are genetic similar to both of the parents. After reproduction the genetic information of the offspring is randomized slightly to simulate mutation. The 1

computation ends when an organism is found to have a certain high level of fitness (possibly maximal). Many implementations also stop after a certain number of generations. There are very little consensus on the specifics of how this should be handled, however EC has proved very good at solving certain problems which elude other methods such as exhaustive search and so interest in the field has remained high. 2 Encodings Generally the aspects of EC that are open to interpretation are the encoding of the solutions, the algorithm used as the reproduction operation and the fitness function. In most implementations the reproduction algorithm is very simple, almost completely evident from the data structures. Thus of particular interest to this paper is the encoding of the solution, the data structure which is used to represent elements of the solution space. 2.1 Holland s Schema Theory One theory to aid in understanding how EC works is the building block theory. This theory holds that over time organism discover building blocks which convey above average fitness. This building blocks then spread throughout the population and are combined with other above average blocks until a solution is found. To investigate building blocks we need to introduce the concept of a schema. Consider a simple encoding of strings of length N from the alphabet {1,0} in this situation a schema is a string from the alphabet {1,0,} where represents do not care for example 10 represents {1100, 1101, 0100, 0101} Holland s Schema Theory predicts how the abundance of schema should change over time [2]. The abundance of the schema H in the generation t is denoted m(h,t) and is calculated simply as the number of fraction of organisms in the population who are representatives of the schema. The number of non- symbols is called the order, denoted O(H). The distance between the farthest two non-s is called the defining length, denoted L(H) Suppose that we use a crossover reproduction. That is to reproduce we take two strings pick an integer k [1,N 1] and create two child strings 2

in which the first k symbols in the first child are the same as those in the first parent and the remaining N k are the same as those from second parent. Note we don t allow reproduction at the very ends of our string as this would propagate the parents into the next generation, something we may choose to do separately The second child is the same method but with the parents roles reversed. Note that this operation (which we denote k depending on our choice of k) is closed under our solution space; this is a requirement of reproduction operations. For example, if we have the two strings 1111111111 and 0000000000 (N = 10) then: 1111111111 5 0000000000 = {1111100000,0000011111} Holland s Theory states that for a string encoding: Theorem 2.1. E [m(h,t 1)] = Mp(H,t) (1 p m ) O(H) [ ] L(H) (1 p(h,t)) N 1 Here M is the size of our population; p m is the probability of mutating a bit (recall that our model involves random mutations). p(h, t) is the probability of selecting a representative of the schema calculated as p(h, t) = m(h,t)f(h,t) Mf(t) where f(h, t) is the average fitness of the representatives of H in generation t and f(t) is the average fitness of an organism in generation t. Furthermore E [] represents the expected value of a random variable. What this is saying is that as expected fit schema will increase in abundance will unfit schema will decrease in abundance. However as our schema grow in size the chance that crossover will occur somewhere within the schema thus destroying it increases. This theorem holds for any string encoding that uses crossover reproduction and for the more advanced encodings we will merely need to adjust our concept of the length of a schema. 2.2 Locality Holland s Theory considers EC as a search through the possible schema. As we sort through the schema randomly we come upon certain ones that are better than others. We select such that these better ones become more abundant in the next generation than their less fit counterparts and thus are explored more. As the process continues the schema are perfected and combined with one another making more complicated (longer) schema. If 3

we take this to be how EC works then we can expect that it will work best when the equation above is largely dependant on the fitness of the schema. Toward this goal we will completely ignore the (1 p m ) O(H) term; since we have complete control over it; it s normally close to 1. However the big problem [ here is that] the equation becomes overly dependant on the term L(H) 1 N 1 (1 p(h,t)) as O(H) N since: As: O(H) N L(H) 1 (1 p(h,t)) 1 N 1 For an EC to perform according the Building Block theory it is important that it increase the abundance of fit schema will decreasing the abundance of bad ones. As the theorem shows implementations have trouble doing this when the schema grow to large. This property of an EC, how good it is at proliferating fit schema regardless of length is called the locality. Encodings that exhibit poor locality have a tendency to proliferate superschema of the fit schema by crossing over within the schema. Encodings the exhibit poor locality have trouble scaling to find specific solutions in large spaces. 2.3 An Illustrative Example Consider the following example: Straight Solution Set [0,9] 10 Fitness Function f(s) = Sup{L(s) s substring S and s is a straight} Reproduction Mechanism Crossover (denoted k k [1,9]) Here a straight is just a string s 1 s 2...s n s.t. i s i1 = s i 1. Here we use L(s) as the length for a string, not for a schema. Our intuition about this example is that it should exhibit poor locality. If we consider the schema 012345678, then despite the fact that this is an exceptionally fit schema the probability that an offspring will be a member of the same schema is around 1 9 since crossover occurring anywhere but the last position (k = 9) will result in offspring that aren t members of the schema. 4

3 Other Encodings 3.1 LISP Trees Here we turn to a subset of EC called, Genetic Programming (GP). Genetic Programming is evolutionary computing in which our solution set consists of computer programs which are evolved to solve a specific problem. GP also sometimes refers to the evolution of physical aspects of machines. In particular it has been used with great success to evolve electronic controllers and circuits. To solve this type of problem our encodings, and reproduction operations, have to get more complicated and along with them our reproduction operators. To begin with, we select a set of of terminals which are independent variables (inputs to our functions), constants, and zero-argument functions (these might be used to return a global state of the machine). Next we come up with a primitive set of functions to be used in the program. Again, we need a fitness function and a reproduction operator. The organisms in our population are LISP trees in which every node is a primitive function with as many children as the function has arguments. We further specify that the leaves must all be members of the terminal set. Reproduction occurs by selecting subtrees of the parents and swapping them at the root of the subtree. 3.2 An Example This is best understood by an example. Suppose we wish to evolve a program to calculate g(x) = x 2 1. We might implement the following solution: x 2 1 Terminal Set {0, 1, x} Primitive Function Set {, } Fitness Function f(o) = 10 10 O(x) (x 2 1) dx Reproduction Operator Subtree Splicing (denoted ) Notice that here a perfect solution would have a fitness of 0 with worse solutions diverging toward. Continuing with the example suppose that we select the following initial population: 5

O 0 O 1 x 0 1 x 0 f(o 0 ) = 2060 3 f(o 1 ) = 2060 3 O 2 O 3 1 1 x x 0 x f(o 2 ) = 101 f(o 3 ) = 1948 3 Next we go through and make new members for our population by breeding the four above. Because O 2 is substantially more fit than the others it is more likely to be chosen so our random choices might result in the combinations: O 2 O 0 O 2 O 3 since each reproduction produces two children this will be enough to completely repopulate. O 2 O 0 x 0 x x 0 x Child 0 Child 1 0 x x x 0 x 6

O 2 O 3 1 1 x x 0 x Child 2 Child 3 1 1 x x 0 x Now we would iterate this process again except that this time we find that Child 2 has maximum fitness so we know that we ve found as good an answer as we can expect this algorithm to find (and it happens to be the best answer anyone s going to find). 3.3 LISP Tree Locality Again in LISP trees we have the problem of locality. In our example O 2, had a particularly high fitness and we can see that this is because it figured out that the answer was the sum of x 2 and something. However, our algorithm lacks a concept of what makes the organism good. It was just lucky that the algorithm choose to swap the left daughter with 1 and not to make a more destructive swap. Ultimately the problem boils down to the fact that with this encoding we have no way of separating the useless or detrimental parts of an organism s genetic code from the helpful parts. 4 A Slightly Different Approach Our problems stem from the fact that our algorithm attempts to build monolithic solutions; it lacks a concept of building blocks. In the previous example our chances of destroying the building block that made O 2 good were higher than those of making a meaningful improvement. Furthermore, the previous 7

solution was a trivial test case. For bigger problems in which the correct trees are bigger we can expect the probability of making a wrong selection to grow rapidly. The number of possible decisions grow exponentially with the depth of the tree. And so we can see that this approach has trouble with scalability. 4.1 The Encoding Instead of using the trees of the prior example, which are the equivalent of expressions, we make a slight modification to allow trees with inputs marked as s, making them equivalent to LISP functions. We ll call these functionaltrees. For example: This would be equivalent to the function f(x 1,x 2,x 3 ) = (x 1 x 2 ) x 3. In this encoding we also allow numeric constants or 0 argument functions as terminals, however this is not the method by which the independent variables are placed. Notice that this notation can also be considered as a schema, if we consider the stars as don t cares, meaning that they could be replaced with any tree. To complete these as schema there is an implied at the top of the tree, ie not only can a member of the schema contain arbitrary trees in place of the stars, but it also need only contain the functional-tree as a subtree. Above tree is an example of the schema represented by both of the following trees: 8

4.2 Reproduction With this encoding we lose a natural way to evaluate fitness, since we have members of the population that accept the wrong number of arguments. However, reproduction becomes a very graceful procedure. We have the very natural procedure of piping one function s output in as another s input to compose them. This operation is versatile in that it allows us to create a great range of functions. However its most elegant feature is that, if we think of our agents as schema then combining two schema under this operation yields a subset of their intersection, so we can expect to learn more about each. As elegant as this operation is, it has a very noticeable deficiency. When we removed the independent variables as terminals, we made the process of specifying where the variables are inputted harder. Suppose the above example were being used in a population to derive x 2 1; it would be impossible, since there s simply no way to specify that the two branches of the node should use the same variable. So along with our pipe operation we add in a pinch operation which combines two input slots into one: With this the tree is very close to finding a solution. 4.3 Evaluating Fitness Notice that our trouble with evaluating these organisms is not universal. Some organisms happen to accept the correct number of arguments and can be evaluated in a straightforward manner. In searching for a global way to evaluate organisms of this encodings, we should recall our treatment of these as schema. As such, the question we want to answer about each element is: how well would members of this schema that do accept the correct num- 9

ber of arguments fare. Now we could begin looking at the members of the schema; of course we can t do this exhaustively since the trees grow without bound, but we always knew that we wouldn t be able to exhaustively search an infinite set so placing an upper bound on tree size is acceptable. But doing exhaustive searches is expensive inelegant and precisely what genetic solutions are supposed to replace. The whole point of Evolutionary Computing is that we have something better than randomly sampling. We have an entire population of better than average subtrees. We can use these as smarter building blocks to do a smarter evaluation. 4.4 A More Elegant Version sh This process of evaluation, wherein we combine selections of our population in order to evaluate them, can be made just a touch more elegant. Our population itself is composed entirely of combinations of elements each generation building upon the last by combining old to make new. As mentioned some members of our population can be evaluated easily. However these elements are composed of other elements from the population. So we can consider fitness not as an assigned value but as a currency, the functions that happen to be evaluable are assigned a fitness but must share a certain amount with their constituent functions. This is much more elegant as we need only evaluate a subset of our population and need not deal with sampling the schema to evaluate. Note that an organism that was born in the nth generation. Is composed of organisms born in the generations prior to generation n so for this system to work organisms need to serve for multiple generations to give them a chance be used in compositions. This property of organisms surviving into later generations is commonly known as incest. 5 Computation Genetic Programming is a very computationally intensive task. Furthermore Genetic Programming is more than theoretically interesting it s an extremely useful technique which can be used as an invention machine to rival human invention. As such we re interested in being able to do this fast and on a large scale. The main key to speed is how well this algorithm parallelizes. Parallelizing GP is really a trivial task because it s composed of a number of discrete computations that can be done independently. 10

6 Conclusion This modification of LISP tree based GP may seem contrived at first. However it is really a very natural response to the problem of locality. A running theme in EC is that anytime we the implementer relinquish control over an aspect of the encoding and evolution process the evolution takes control of this and thus naturally selects the best way to handle it. This is just a logical next step wherein we stop imposing contrived locations where our organism may crossover their genetic information and instead let them specify these locations for themselves. References [1] D E Goldberg, B Korb, K Deb, Messy Genetic Algorithms: Motivation, Analysis and First Results 1989. [2] R. Poli, Hyperschema Theory for GP with One-Point Crossover, Building Blocks, and Some New Results in GA Theory, 2001. 11