Classification Using Genetic Programming. Patrick Kellogg General Assembly Data Science Course (8/23/15-11/12/15)

Similar documents
Midterm Examination CS 540-2: Introduction to Artificial Intelligence

Uninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall

Genetic Programming. and its use for learning Concepts in Description Logics

Genetic programming. Lecture Genetic Programming. LISP as a GP language. LISP structure. S-expressions

Hybrid Adaptive Evolutionary Algorithm Hyper Heuristic

Review: Final Exam CPSC Artificial Intelligence Michael M. Richter

ADVANCED CLASSIFICATION TECHNIQUES

Midterm Examination CS540-2: Introduction to Artificial Intelligence

A Systematic Study of Automated Program Repair: Fixing 55 out of 105 Bugs for $8 Each

Machine Learning: Algorithms and Applications Mockup Examination

Beyond Classical Search: Local Search. CMPSCI 383 September 23, 2011

Kyrre Glette INF3490 Evolvable Hardware Cartesian Genetic Programming

CS 540: Introduction to Artificial Intelligence

Genetic Programming. Modern optimization methods 1

Genetic Algorithms for Classification and Feature Extraction

Fast Knowledge Discovery in Time Series with GPGPU on Genetic Programming

Advanced Search Genetic algorithm

Genetic Algorithms and Genetic Programming. Lecture 9: (23/10/09)

Genetic Programming of Autonomous Agents. Functional Requirements List and Performance Specifi cations. Scott O'Dell

Woody Plants Model Recognition by Differential Evolution

Genetic Algorithms Variations and Implementation Issues

Genetic Algorithms and Genetic Programming Lecture 9

Structural Optimizations of a 12/8 Switched Reluctance Motor using a Genetic Algorithm

Algorithm Design (4) Metaheuristics

CS 4700: Artificial Intelligence

Evolutionary Computation. Chao Lan

Beyond Classical Search

Escaping Local Optima: Genetic Algorithm

Using Genetic Algorithms to Solve the Box Stacking Problem

Local Search and Optimization Chapter 4. Mausam (Based on slides of Padhraic Smyth, Stuart Russell, Rao Kambhampati, Raj Rao, Dan Weld )

Predicting Diabetes using Neural Networks and Randomized Optimization

FOREST PLANNING USING PSO WITH A PRIORITY REPRESENTATION

Decision Tree Learning

Introduction to Artificial Intelligence

Chapter 14 Global Search Algorithms

Local Search and Optimization Chapter 4. Mausam (Based on slides of Padhraic Smyth, Stuart Russell, Rao Kambhampati, Raj Rao, Dan Weld )

A Genetic Algorithm-Based Approach for Building Accurate Decision Trees

Clustering & Classification (chapter 15)

Evolutionary Search in Machine Learning. Lutz Hamel Dept. of Computer Science & Statistics University of Rhode Island

Local Search and Optimization Chapter 4. Mausam (Based on slides of Padhraic Smyth, Stuart Russell, Rao Kambhampati, Raj Rao, Dan Weld )

Traffic Signal Control Based On Fuzzy Artificial Neural Networks With Particle Swarm Optimization

Algorithms & Complexity

A Genetic Algorithm for Minimum Tetrahedralization of a Convex Polyhedron

Gen := 0. Create Initial Random Population. Termination Criterion Satisfied? Yes. Evaluate fitness of each individual in population.

Santa Fe Trail Problem Solution Using Grammatical Evolution

Neural Network Weight Selection Using Genetic Algorithms

Geometric Routing: Of Theory and Practice

CS 380: Artificial Intelligence Lecture #4

Genetic Algorithms. PHY 604: Computational Methods in Physics and Astrophysics II

Multi-label classification using rule-based classifier systems

Fall 2003 BMI 226 / CS 426 Notes F-1 AUTOMATICALLY DEFINED FUNCTIONS (ADFS)

Hill Climbing. Assume a heuristic value for each assignment of values to all variables. Maintain an assignment of a value to each variable.

Non-deterministic Search techniques. Emma Hart

A Tree Kernel Based Approach for Clone Detection

Genetic Programming Prof. Thomas Bäck Nat Evur ol al ut ic o om nar put y Aling go rg it roup hms Genetic Programming 1

Semantically-Driven Search Techniques for Learning Boolean Program Trees

Genetic Programming. Charles Chilaka. Department of Computational Science Memorial University of Newfoundland

MOGA for NSLS2 DA Optimization

Using Machine Learning to Optimize Storage Systems

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

Separating Axis Test (SAT) and Support Points in 2D. Randy Gaul Special thanks to Dirk Gregorius for his GDC 2013 lecture and slides

Feature Subset Selection using Clusters & Informed Search. Team 3

University of Waterloo Department of Electrical and Computer Engineering ECE 457A: Cooperative and Adaptive Algorithms Midterm Examination

Introduction to Design Optimization: Search Methods

Introduction (7.1) Genetic Algorithms (GA) (7.2) Simulated Annealing (SA) (7.3) Random Search (7.4) Downhill Simplex Search (DSS) (7.

Lecture on Modeling Tools for Clustering & Regression

Internal vs. External Parameters in Fitness Functions

Outline. Best-first search. Greedy best-first search A* search Heuristics Local search algorithms

The Basics of Decision Trees

Evolving SQL Queries for Data Mining

Learning Composite Operators for Object Detection

Introduction to Artificial Intelligence

Local Search (Ch )

Automated Program Repair through the Evolution of Assembly Code

Administrative. Local Search!

Genetic Algorithms for Vision and Pattern Recognition

Artificial Intelligence

Artificial Intelligence

Solving Traveling Salesman Problem Using Parallel Genetic. Algorithm and Simulated Annealing

Trees 2: Linked Representation, Tree Traversal, and Binary Search Trees

Supervised Learning for Image Segmentation

Machine Evolution. Machine Evolution. Let s look at. Machine Evolution. Machine Evolution. Machine Evolution. Machine Evolution

Evolutionary Algorithms: Perfecting the Art of Good Enough. Liz Sander

Evolutionary Computation, 2018/2019 Programming assignment 3

Lecture 6: Genetic Algorithm. An Introduction to Meta-Heuristics, Produced by Qiangfu Zhao (Since 2012), All rights reserved

Informed search algorithms. (Based on slides by Oren Etzioni, Stuart Russell)

K-Consistency. CS 188: Artificial Intelligence. K-Consistency. Strong K-Consistency. Constraint Satisfaction Problems II

An O(n) Approximation for the Double Bounding Box Problem

ADAPTATION OF REPRESENTATION IN GP

ARTIFICIAL INTELLIGENCE. Informed search

Heuristic Optimization Introduction and Simple Heuristics

CAD Algorithms. Placement and Floorplanning

Genetic improvement of software: a case study

OPTIMIZATION METHODS. For more information visit: or send an to:

Artificial Intelligence

The k-means Algorithm and Genetic Algorithm

Lecture 4. Convexity Robust cost functions Optimizing non-convex functions. 3B1B Optimization Michaelmas 2017 A. Zisserman

ATI Material Do Not Duplicate ATI Material. www. ATIcourses.com. www. ATIcourses.com

Artificial Intelligence. Chapters Reviews. Readings: Chapters 3-8 of Russell & Norvig.

A Classifier with the Function-based Decision Tree

Transcription:

Classification Using Genetic Programming Patrick Kellogg General Assembly Data Science Course (8/23/15-11/12/15)

Iris Data Set

Iris Data Set

Iris Data Set

Iris Data Set

Iris Data Set Create a geometrical boundary for the class Setosa

Automatically Creating Functions def IsInClass(x,y): if ( (y > (2*x + 10)) \ and (y > (0.3*x + 4.5)) \... and (x < 5)): return true else: return false

Evolving Parameters (y > (2x + 10) and (y > (0.3x + 4.5))...

Evolving Parameters (y > (2x + 10) and (y > (0.3x + 4.5))... y > β 1 x + α 1 y > β 2 x + α 2...

Evolving Parameters (y > (2x + 10) and (y > (0.3x + 4.5))... y > β 1 x + α 1 y > β 2 x + α 2... = Genetic Programming (GP)

Two-slide Introduction to Genetic Algorithms (Part 1)

Two-slide Introduction to Genetic Algorithms (Part 1) Number legs = 6 N6

Two-slide Introduction to Genetic Algorithms (Part 1) Number legs = 4 Length legs = 8 N4 L8

Two-slide Introduction to Genetic Algorithms (Part 1) Number legs = 4 Length legs = 8 Size = 6 N4 L8 S6

Two-slide Introduction to Genetic Algorithms (Part 1) Number legs = 0 Length legs = 8 Size = 3 Energy = 20 N0 L8 S3 E20

Two-slide Introduction to Genetic Algorithms (Part 2) N6 L4 S3 E10 N4 L8 S3 E10 N4 L8 S6 E10 N0 L8 S3 E20 Initial Population

Two-slide Introduction to Genetic Algorithms (Part 2) N6 L4 S3 E10 N4 L8 S3 E10 N4 L8 S6 E10 N0 L8 S3 E20 N6 L4 S3 E10 = 26 N4 L8 S3 E10 = 14 N4 L8 S6 E10 = 32 N0 L8 S3 E20 = 0 Fitness Function

Two-slide Introduction to Genetic Algorithms (Part 2) N6 L4 S3 E10 N4 L8 S3 E10 N4 L8 S6 E10 N0 L8 S3 E20 N6 L4 S3 E10 = 26 N4 L8 S3 E10 = 14 N4 L8 S6 E10 = 32 N0 L8 S3 E20 = 0 Selection N6 L4 S3 E10 N4 L8 S6 E10

Two-slide Introduction to Genetic Algorithms (Part 2) N6 L4 S3 E10 N4 L8 S3 E10 N4 L8 S6 E10 N0 L8 S3 E20 N6 L4 S3 E10 = 26 N4 L8 S3 E10 = 14 N4 L8 S6 E10 = 32 N0 L8 S3 E20 = 0 N6 L4 S3 E10 N4 L8 S6 E10 N7 L4 S3 E10 Mutation

Two-slide Introduction to Genetic Algorithms (Part 2) N6 L4 S3 E10 N4 L8 S3 E10 N4 L8 S6 E10 N0 L8 S3 E20 N6 L4 S3 E10 = 26 N4 L8 S3 E10 = 14 N4 L8 S6 E10 = 32 N0 L8 S3 E20 = 0 N6 L4 S3 E10 N4 L8 S6 E10 N7 L4 S3 E10 N6 L4 S6 E10 N4 L8 S3 E10 Crossover

Syntax Tree-Based GP and > β 1 x + α 1 < β 2 x + α 2 or > β 3 x + α 3

Syntax Tree-Based GP and > β 1 x + α 1 < β 2 x + α 2 or > β new x + α 3 Mutation

Syntax Tree-Based GP and > β1 x + α3 < β 2 x + α 2 or > β 3 x + α1 Crossover

My Python Code Randomly create an initial population of 12 linear candidates Run fitness function on all 12 Select top 2 candidates Mutate each four times (+α, -α, +β, -β) Crossover twice Repeat until error is small enough for next step (which is to add or remove a terminal from the tree)

Sample Run of Hill-climbing

Sample Run of Hill-climbing

Sample Run of Hill-climbing

Sample Run of Hill-climbing

Sample Run of Hill-climbing

Sample Run of Hill-climbing

Sample Run of Hill-climbing

Sample Run of Hill-climbing

Sample Run of Hill-climbing

Future Work Evolving other shapes that aren t linear

Future Work Evolving other shapes that aren t linear

Future Work Evolving other shapes that aren t linear

Future Work Evolving other shapes that aren t linear Definition of a circle: (x-h) 2 + (y-k) 2 = r 2.

Future Work Evolving other shapes that aren t linear Definition of a circle: (x-h) 2 + (y-k) 2 = r 2.

Future Work Evolving other shapes that aren t linear Definition of a circle: (x-h) 2 + (y-k) 2 = r 2.

Database look-up Future Work

Database look-up Future Work

Database look-up Future Work

Future Work Database look-up Enables bi-directional search

Future Work Automatically turn results into python function Recode for multi-dimensional data Mutate parameters based on error delta Speed up search (aka smash into centroid) Concave shapes ( or as well as and ) Study initial population size, distribution Play with function size reward Density Look at Specificity vs. Sensitivity vs. size trade-off A three-legged stool and difficult to tune

Backup Slides

Fitness Function = (((1-α) + (1-β)) / 2) * function size reward = ((specificity + power (or sensitivity))/2) * size Where: α = false positive rate β = false negative rate more on this next Function goes from 0 (worst) to 1 (best)

Creating Dummy Data Vs. Whitespace First attempt: create dummy data (with same density as class data Final solution: let the amount of whitespace determine the false positive rate (the specificity)

Crossover and Deleting/Adding Leaves Adding leaves Once the error reaches a steady state, a new linear candidate may be added Deleting leaves Or randomly, a candidate may be introduced that has a leaf (or an entire subtree) missing Prevents overfitting

Mutation and Error Rate Save the previous fitness value to calculate a good next mutation Another good idea is to smash the line towards the centroid of the class until it hits the edge of the data