Genetic Programming. Charles Chilaka. Department of Computational Science Memorial University of Newfoundland

Similar documents
GENETIC ALGORITHM with Hands-On exercise

The k-means Algorithm and Genetic Algorithm

Genetic Algorithms. Kang Zheng Karl Schober

Neural Network Weight Selection Using Genetic Algorithms

Introduction to Genetic Algorithms. Based on Chapter 10 of Marsland Chapter 9 of Mitchell

CHAPTER 4 GENETIC ALGORITHM

Genetic Algorithms. Genetic Algorithms

Outline. Motivation. Introduction of GAs. Genetic Algorithm 9/7/2017. Motivation Genetic algorithms An illustrative example Hypothesis space search

Introduction to Genetic Algorithms. Genetic Algorithms

4/22/2014. Genetic Algorithms. Diwakar Yagyasen Department of Computer Science BBDNITM. Introduction

Evolutionary Computation Part 2

CHAPTER 5 ENERGY MANAGEMENT USING FUZZY GENETIC APPROACH IN WSN

Introduction to Optimization

Introduction to Optimization

Mutations for Permutations

The Parallel Software Design Process. Parallel Software Design

Introduction to Genetic Algorithms

Artificial Intelligence Application (Genetic Algorithm)

Genetic Algorithm for Dynamic Capacitated Minimum Spanning Tree

Automata Construct with Genetic Algorithm

Review: Final Exam CPSC Artificial Intelligence Michael M. Richter

Evolutionary Computation. Chao Lan

Similarity Templates or Schemata. CS 571 Evolutionary Computation

Information Fusion Dr. B. K. Panigrahi

Reducing Graphic Conflict In Scale Reduced Maps Using A Genetic Algorithm

DETERMINING MAXIMUM/MINIMUM VALUES FOR TWO- DIMENTIONAL MATHMATICLE FUNCTIONS USING RANDOM CREOSSOVER TECHNIQUES

Genetic Algorithm for Dynamic Capacitated Minimum Spanning Tree

A Genetic Algorithm Approach to the Group Technology Problem

[Premalatha, 4(5): May, 2015] ISSN: (I2OR), Publication Impact Factor: (ISRA), Journal Impact Factor: 2.114

A Genetic Algorithm for Graph Matching using Graph Node Characteristics 1 2

Using Genetic Algorithms in Integer Programming for Decision Support

Genetic Algorithms Variations and Implementation Issues

Introduction to Optimization

Lecture 6: Genetic Algorithm. An Introduction to Meta-Heuristics, Produced by Qiangfu Zhao (Since 2012), All rights reserved

A Steady-State Genetic Algorithm for Traveling Salesman Problem with Pickup and Delivery

Discriminate Analysis

Introduction to Evolutionary Computation

The Genetic Algorithm for finding the maxima of single-variable functions

Genetic Algorithm. Dr. Rajesh Kumar

Heuristic Optimisation

March 19, Heuristics for Optimization. Outline. Problem formulation. Genetic algorithms

Computational Intelligence

Automatic Selection of GCC Optimization Options Using A Gene Weighted Genetic Algorithm

CS5401 FS2015 Exam 1 Key

Evolutionary Computation Algorithms for Cryptanalysis: A Study

Chapter 14 Global Search Algorithms

Genetic Algorithm for Circuit Partitioning

CHAPTER 4 FEATURE SELECTION USING GENETIC ALGORITHM

CONCEPT FORMATION AND DECISION TREE INDUCTION USING THE GENETIC PROGRAMMING PARADIGM

Job Shop Scheduling Problem (JSSP) Genetic Algorithms Critical Block and DG distance Neighbourhood Search

Suppose you have a problem You don t know how to solve it What can you do? Can you use a computer to somehow find a solution for you?

AN EVOLUTIONARY APPROACH TO DISTANCE VECTOR ROUTING

Khushboo Arora, Samiksha Agarwal, Rohit Tanwar

1. Introduction. 2. Motivation and Problem Definition. Volume 8 Issue 2, February Susmita Mohapatra

V.Petridis, S. Kazarlis and A. Papaikonomou

FEATURE GENERATION USING GENETIC PROGRAMMING BASED ON FISHER CRITERION

Adaptive Crossover in Genetic Algorithms Using Statistics Mechanism

Genetic Algorithms for Classification and Feature Extraction

CHAPTER 5. CHE BASED SoPC FOR EVOLVABLE HARDWARE

Binary Representations of Integers and the Performance of Selectorecombinative Genetic Algorithms

Local Search (Greedy Descent): Maintain an assignment of a value to each variable. Repeat:

Evolutionary Algorithms. CS Evolutionary Algorithms 1

Chapter 9: Genetic Algorithms

ARTIFICIAL INTELLIGENCE (CSCU9YE ) LECTURE 5: EVOLUTIONARY ALGORITHMS

Genetic Programming: A study on Computer Language

Genetic Model Optimization for Hausdorff Distance-Based Face Localization

Grid Scheduling Strategy using GA (GSSGA)

Clustering Analysis of Simple K Means Algorithm for Various Data Sets in Function Optimization Problem (Fop) of Evolutionary Programming

Kyrre Glette INF3490 Evolvable Hardware Cartesian Genetic Programming

MAXIMUM LIKELIHOOD ESTIMATION USING ACCELERATED GENETIC ALGORITHMS

Preliminary Background Tabu Search Genetic Algorithm

Multiobjective Job-Shop Scheduling With Genetic Algorithms Using a New Representation and Standard Uniform Crossover

Genetic Algorithm based Fractal Image Compression

Automated Test Data Generation and Optimization Scheme Using Genetic Algorithm

Encoding Techniques in Genetic Algorithms

Genetic Algorithm Based Template Optimization for a Vision System: Obstacle Detection

An Introduction to Evolutionary Algorithms

Genetic Programming. and its use for learning Concepts in Description Logics

Distributed Optimization of Feature Mining Using Evolutionary Techniques

Network Based Models For Analysis of SNPs Yalta Opt

Genetic Algorithms Applied to the Knapsack Problem

A New Selection Operator - CSM in Genetic Algorithms for Solving the TSP

Classification of Hyperspectral Breast Images for Cancer Detection. Sander Parawira December 4, 2009

What is GOSET? GOSET stands for Genetic Optimization System Engineering Tool

Population Genetics (52642)

Optimizing Flow Shop Sequencing Through Simulation Optimization Using Evolutionary Methods

Genetic Programming. Modern optimization methods 1

A GENETIC ALGORITHM FOR CLUSTERING ON VERY LARGE DATA SETS

CHAPTER 2 CONVENTIONAL AND NON-CONVENTIONAL TECHNIQUES TO SOLVE ORPD PROBLEM

Optimization of Association Rule Mining through Genetic Algorithm

Introduction to Design Optimization: Search Methods

NOVEL HYBRID GENETIC ALGORITHM WITH HMM BASED IRIS RECOGNITION

Using Genetic Algorithm with Triple Crossover to Solve Travelling Salesman Problem

Using Genetic Algorithms to Solve the Box Stacking Problem

CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Optimization Technique for Maximization Problem in Evolutionary Programming of Genetic Algorithm in Data Mining

Research Article Path Planning Using a Hybrid Evolutionary Algorithm Based on Tree Structure Encoding

An Evolutionary Algorithm for the Multi-objective Shortest Path Problem

Feature Selection Using Principal Feature Analysis

Transcription:

Genetic Programming Charles Chilaka Department of Computational Science Memorial University of Newfoundland Class Project for Bio 4241 March 27, 2014 Charles Chilaka (MUN) Genetic algorithms and programming Bio 4241 1 / 37

Outline 1 Historical perspective, definitions and examples. 2 Genetic Algorithms and its mathematical formulation Mathematical formulation of the Schemata equation 3 Genetic Programming 4 Application of Genetic programming in Biological Research Limitations of Genetic Programming Charles Chilaka (MUN) Genetic algorithms and programming Bio 4241 2 / 37

Historical perspective, definitions and examples. Historical perspective, definitions and examples Basic elements of genetic programming GP introduced by Stanford University Professor John R. Coza in 1987. It is a domain dependent method that genetically breeds population of computer programs to solve problems. Basically applies the principle of Darwinian evolution to breed a new (and often improved) population of programs. It is a step further in Genetic Algorithms GA. Prof. Wolfgang Banzhaf (our own) is a leading light in this area. The development history: EC(Rechenberg, 1960)= GP(Koza, 1987/1992)+ ES (Rechenberg,1965)+ EP (Fogel, 1962) + GA(Holland, 1970) Charles Chilaka (MUN) Genetic algorithms and programming Bio 4241 3 / 37

Historical perspective, definitions and examples. Historical perspective, definitions and examples The mathematical foundation of genetic algorithms and genetic programming is the schemata theory proposed by J.H. Holland in 1973. It is a statement about the propagation of schemata (or building blocks) within all individuals of one generation. Definition A Schemata are sets of strings that have one or more features in common. It consists of bit strings (0,1) and a don t care symbol. Every schema matches exactly 2 r strings, where r is the number of don t care symbols in the schema template. A schema is a generalization of (parts of) an individual. Example The set of the schema 1101 0 are 1110110, 1110100, 0110110, 0110100. Charles Chilaka (MUN) Genetic algorithms and programming Bio 4241 4 / 37

Historical perspective, definitions and examples. Definition The length of a schema α(s) is the distance from the first to the last fixed symbol(1 or 0 but not ). Example The individuals 100100100100000111101000101 and 010010010100000111111010101 of length 25 can be summarized by the schema 0 0 01000001111 10 0101 where all identical positions are retained and differing positions marked with a. Definition The order of a schema o(s) is the number of fixed positions (1 or 0 but not ). The above example has order equals 19. Charles Chilaka (MUN) Genetic algorithms and programming Bio 4241 5 / 37

Historical perspective, definitions and examples. Basic facts from Biology All living organisms consist of cells containing the same set of one or more chromosomes i.e strings of DNA. A gene seen as an encoder of a characteristic, such as eye color. The different possibilities for a characteristic (eg, brown, green, blue, gray) are called alleles. Each gene is located at a particular position (locus) on the chromosome. Most organisms have multiple chromosomes in each cell. The sum of all chromosomes i.e the complete collection of genetic material is called the genome of the organism. Charles Chilaka (MUN) Genetic algorithms and programming Bio 4241 6 / 37

Historical perspective, definitions and examples. Basic facts from Biology cont d Genotype refers to the particular set of genes contained in a genome Organisms whose chromosome are arranged in pairs are called diploid, else haploid. Most sexually producing species are haploid since after meiosis, the number of chromosomes in gametes are halved. For reproduction, the genes of the parents are combined to form a new diploid set of chromosomes Offsprings are subject to mutation where elementary parts of the DNA are changed. The fitness of an organism is defined as its probability to reproduce or as a function of the number of offspring the organism has produced. Charles Chilaka (MUN) Genetic algorithms and programming Bio 4241 7 / 37

Historical perspective, definitions and examples. Extracts from Biology to Genetic Algorithm Chromosome refers to a solution candidate. Genes are single bits or small blocks of neighboring bits that encode a particular element of the solution. Alleles are values 0, 1 and. Crossover operates by exchanging genetic material between two haploid parents. Mutation is implemented by flipping the bit at a randomly chosen locus. Remark Most applications of GA employ haploid single chromosome individuals. The major genetic operators used in GA are parent selection, crossover, mutation and replacement. Charles Chilaka (MUN) Genetic algorithms and programming Bio 4241 8 / 37

Genetic Algorithms and its mathematical formulation Genetic Algorithms and its mathematical formulation Definition GA is a searching process based on the laws of natural selection. It consists of three operations: selection, genetic operation(crossover and mutation) and replacement. Remark The population comprises a group of chromosomes from which candidates can be selected randomly for the solution of a problem. Remark The fitness values of all the chromosomes are evaluated by calculating the objective function in a decoded form(phenotype). Charles Chilaka (MUN) Genetic algorithms and programming Bio 4241 9 / 37

Genetic Algorithms and its mathematical formulation Fitness techniques Two broadbased methods of fitness techniques: 1 Windowing: f i = c ± (v i v m ) (1) where v i and v m are the objective values of chromosome i and worst chromosome m respectively. 2 Linear normalization where d is the decrement rate. f i = f best (i 1) d (2) Charles Chilaka (MUN) Genetic algorithms and programming Bio 4241 10 / 37

Genetic Algorithms and its mathematical formulation Top level description of a Genetic Algorithm. A simple Genetic Algorithm follows these steps: 1 Randomly generate an initial population X (0) := (x 1, x 2,..., x N ) 2 Compute the fitness F (x i ) of each chromosome x i in the current population. 3 Create new chromosomes X r (t) by mating current chromosomes, applying mutation and recombination as the parent chromosomes mate. 4 Delete numbers of the population to make room for the new chromosomes. 5 Compute the fitness of X r (t) and insert this into population 6 t := t + 1, if not (end-test) go to step 3, or else stop and return the best chromosome. Charles Chilaka (MUN) Genetic algorithms and programming Bio 4241 11 / 37

Genetic Algorithms and its mathematical formulation Mathematical formulation of the Schemata equation Mathematical formulation of the schemata equation Remark The mathematics of the schemata theory done by using the effects of selection and genetic operation(crossover and mutation). Remark Since a schema represents a set of strings, associate a fitness value f(s,t) with schema S, and the average fitness of the schema. f(s,t) is determined by all the matched strings in the population. Remark Usage of proportional extimation in the reproduction phase enables us to extimate the number of matched strings of a schema S in the next generation. Charles Chilaka (MUN) Genetic algorithms and programming Bio 4241 12 / 37

Genetic Algorithms and its mathematical formulation Effect of selection Mathematical formulation of the Schemata equation Let M(S,t) be the number of occurrences of a particular schema S in a population of n individuals at time t. The bit string A i of individual i gets selected for reproduction with the probability P i = f (S, t) F (t) where F (t) is the average fitness value of the current generation. The expected number of occurrences of S in the next generation is (3) M(S, t + 1) = M(S, t) f (S, t) F (t). (4) Charles Chilaka (MUN) Genetic algorithms and programming Bio 4241 13 / 37

Genetic Algorithms and its mathematical formulation Effect of selection cont d Mathematical formulation of the Schemata equation By change of variable ɛ = f (S, t) F (t) F (t) and substituting equation (5) into equation (4), we have (5) M(S, t) = M(S, 0)(1 + ɛ) t. (6) Equation (6) shows that an above average schema receives an exponential increasing number of strings in the next generations. Charles Chilaka (MUN) Genetic algorithms and programming Bio 4241 14 / 37

Genetic Algorithms and its mathematical formulation Mathematical formulation of the Schemata equation Effect of crossover and mutation The effect of crossover and mutation can be incorporated in the formulation of the schemata equation. Remark During evolution of a genetic algorithm, interferences come from genetic operations which affect the current schemata. Definition The defining length of a schema α(s) is the distance between the outermost fixed positions. Example The defining lengths of 000 and 1 00 are 2 and 3 respectively. Charles Chilaka (MUN) Genetic algorithms and programming Bio 4241 15 / 37

Genetic Algorithms and its mathematical formulation Effect of crossover and mutation cont d Mathematical formulation of the Schemata equation Assuming the length of the chromosome is L and a point crossover is applied. In general, a crossover point is selected uniformly among L-1 possible positions. The probability of destruction of a schema S is given by The probability of a schema survival is P d (S) = α(s) L 1. (7) P s (S) = 1 P d (S) = 1 α(s) L 1. (8) Charles Chilaka (MUN) Genetic algorithms and programming Bio 4241 16 / 37

Genetic Algorithms and its mathematical formulation Mathematical formulation of the Schemata equation Assuming an operation rate of crossover of P c, the new probability of survival becomes P s (S) 1 P c α(s) L 1 The probability effect of selection(reproduction) and crossover been independent events is the product of the respective probabilities. Hence the equation for the expected occurence of a schema S at time t+1 is M(S, t + 1) = M(S, t) f (S, t) F (t) (9) (1 P c α(s) L 1 ) (10) Equation (10) tells us the rate of increase of the schemata over time is proportional to their relative fitness and inversely proportional to their length. Charles Chilaka (MUN) Genetic algorithms and programming Bio 4241 17 / 37

Genetic Algorithms and its mathematical formulation Effect of mutation Mathematical formulation of the Schemata equation Mutation can affect a schema S at each of it o(s) positions with mutation probability P m. The probability of survival of a single constant position in a schema is P s (S) = 1 P m. (11) The probability of survival of the entire schema is P s (S) = (1 P m ) o(s). (12) For small P m, equation (12) can be approximated by P s 1 o(s) P m. (13) Charles Chilaka (MUN) Genetic algorithms and programming Bio 4241 18 / 37

Genetic Algorithms and its mathematical formulation Total effects Mathematical formulation of the Schemata equation By combining equation (10) and (13), we have the formula for the expected count of a schema: M(S, t + 1) = M(S, t) f (S, t) F (t) (1 P c α(s) L 1 o(s) P m). (14) Recall the schema outperformance factor ɛ i.e equation ( 5), equation (14) can be rewritten as M(S, t + 1) = M(S, t) (1 + ɛ) (1 P c α(s) L 1 o(s) P m). (15) Charles Chilaka (MUN) Genetic algorithms and programming Bio 4241 19 / 37

Genetic Algorithms and its mathematical formulation Mathematical formulation of the Schemata equation Equation (15) is of the form M t = M 0 (1 + ɛ) t f (P c, P m, L, α(s)). (16) Equation (16) says that the number of schemata better than average will exponentially increase over time. Basic rationale behind Genetic Algorithms If the linear representation of a problem allows the formation of a schemata, then the genetic algorithm can efficiently produce individuals that continuously improve in terms of fitness function. Charles Chilaka (MUN) Genetic algorithms and programming Bio 4241 20 / 37

Genetic Programming Genetic Programming Remark Genetic programming is an extension of Genetic algorithm. The main concepts here are also those concepts from biology vis a vis selection, crossover and mutation, and replacement. It is the main function to control the genetic algorithm. Definition Genetic programming is a machine learning algorithm. It uses a fixed representation for programs and creates a recombination operation that could combine programs to produce offspring programs. Charles Chilaka (MUN) Genetic algorithms and programming Bio 4241 21 / 37

Genetic Programming Genetic Programming preparatory steps. Two important steps towards solving a problem using Genetic programming. Define a set or series of functions. 1 Some functions may return a variable or input. 2 Other functions may perform an operation: +,, >, <, if-then-else, do, for are all functions Define the fitness of the program: 1 How many events does it classify correctly? 2 Does it provide the correct output for some cases? 3 Does it fit the data? Charles Chilaka (MUN) Genetic algorithms and programming Bio 4241 22 / 37

Genetic Programming Genetic programming process. Figure: Schematics of a typical genetic programming process Charles Chilaka (MUN) Genetic algorithms and programming Bio 4241 23 / 37

Genetic Programming Representation via Trees. Genetic programming basics easier shown using the tree structure. Definition A tree is a connected graph that contains no circuits. Example Consider the python code def main (): sumadd =float(sumadd) f = float(f) g = float(g) if f > g: sumadd = f f + g else: sumadd = g g + f main() Charles Chilaka (MUN) Genetic algorithms and programming Bio 4241 24 / 37

The tree Structure Genetic Programming Another example of tree representation: Figure: Schematics of a tree structure Charles Chilaka (MUN) Genetic algorithms and programming Bio 4241 25 / 37

Figure: Schematics of crossover Charles Chilaka (MUN) Genetic algorithms and programming Bio 4241 26 / 37 Crossover schematics Genetic Programming

Genetic Programming Many mutation schematics Charles Chilaka (MUN) Genetic algorithms and programming Bio 4241 27 / 37

Application of Genetic programming in Biological Research Genetic programming application in genomics Genetic programming have been applied in the following areas of genomics Genetic network inference Gene expression data classification SNP analysis Epistasis analysis Gene annotation Charles Chilaka (MUN) Genetic algorithms and programming Bio 4241 28 / 37

Application of Genetic programming in Biological Research Breast Cancer Research The analysis of SNP s unveils the relationships between the genotypic and phenotypic information as well as the identification of SNP s related to a disease. Feature extraction is one of the most important research areas in pattern recognition challenges. Two major steps in feature extraction: Extraction of relevant information from raw data to an original feature vector with m dimensions. Creation of Extracted feature vector with n dimensions (m > n) from the parameter vector. Principal Component Analysis (PCA) and Fisher Linear Discriminant Analysis(FLDA) are the two major linear feature extractors. Charles Chilaka (MUN) Genetic algorithms and programming Bio 4241 29 / 37

Application of Genetic programming in Biological Research Principal Component Analysis Good for feature extraction and dimensionality reduction in terms of linear transformation. The m m covariance matrix is given by S = N (x i η)(x i η) T (17) i=1 where x i is the ith observation, 1 i N, N is the number of observations, and η is the global mean η = N i=1 x i N. Charles Chilaka (MUN) Genetic algorithms and programming Bio 4241 30 / 37

Application of Genetic programming in Biological Research PCA Cont d By solving the eigenvalue problem, SW T i = λ i W T i 1 i m (18) where λ is a set of eigenvalues and W is a set of eigenvectors corresponding to the eigenvalues, the n components of a given observation vector is obtained. The n principal components of the projected feature vector Y are obtained by the largest n eigenvalues and corresponding eigenvectors. Charles Chilaka (MUN) Genetic algorithms and programming Bio 4241 31 / 37

Application of Genetic programming in Biological Research Fisher Linear Discriminant Analysis This is the classical method for real valued feature extraction using a linear transformation. Maximizes the ratio of between-class scatter and within-class scatter of projected features. For k classes, the ith observation vector from class j is defined by x ij where 1 j k, 1 i N j, and N j is the number of observations from class j. The within-class covariance matrix is defined by where N j k S W = S j (19) j=1 S j = (x ji η j )(x ji η j ) T (20) i=1 Charles Chilaka (MUN) Genetic algorithms and programming Bio 4241 32 / 37

Application of Genetic programming in Biological Research FLDA cont d The between-class covariance matrix is defined by S B = k N j (η j η)(η j η) T (21) j=1 where η j is the mean of class j and η is the global mean. The Fisher discrimant criterion is defined to maximize the objective function J(ρ) = W T S B W W T (22) S W W The optimal transformation matrix W is obtained by S B W T i = λ i S W W T i, 1 i m (23) Charles Chilaka (MUN) Genetic algorithms and programming Bio 4241 33 / 37

Application of Genetic programming in Biological Research It is worth the salt Usage in the analysis of large, complex data concerning cancer research and better clinical undestanding. Ability to find useful combination of features from a molecular data set in an unbiased manner. Production of realistic and understandable non linear models to describe diseases and their states. Creation of human readable results in contrast to the black box solution. Characterization of the diseases. Charles Chilaka (MUN) Genetic algorithms and programming Bio 4241 34 / 37

Application of Genetic programming in Biological Research Limitations of Genetic Programming Limitations of Genetic Programming It is not an optimization algorithm : Finding the best solution to a problem Finding solutions that are overfit to data in consideration. Find the most parsimonious solution that meets our halting criteria. It is computationally intensive process that require tens or hundreds of thousands of programs to be tested. It should not be seen as an all in all solution that can be run on a desktop machine to analyse biological data. Charles Chilaka (MUN) Genetic algorithms and programming Bio 4241 35 / 37

Application of Genetic programming in Biological Research Limitations of Genetic Programming Open questions and future research: Beagle deck and Darwins insights have formed some strong basis for solving some intricate problems even in this modern days. Charles Chilaka (MUN) Genetic algorithms and programming Bio 4241 36 / 37

Application of Genetic programming in Biological Research Limitations of Genetic Programming Open questions and future research: Beagle deck and Darwins insights have formed some strong basis for solving some intricate problems even in this modern days. How far can Genetic programming go? Charles Chilaka (MUN) Genetic algorithms and programming Bio 4241 36 / 37

Application of Genetic programming in Biological Research Limitations of Genetic Programming Open questions and future research: Beagle deck and Darwins insights have formed some strong basis for solving some intricate problems even in this modern days. How far can Genetic programming go? Can we make it less expensive? Charles Chilaka (MUN) Genetic algorithms and programming Bio 4241 36 / 37

Application of Genetic programming in Biological Research Limitations of Genetic Programming THANK YOU!... Charles Chilaka (MUN) Genetic algorithms and programming Bio 4241 37 / 37