Genetic Algorithms for Classification and Feature Extraction

Similar documents
Acknowledgments References:

Distributed Optimization of Feature Mining Using Evolutionary Techniques

The k-means Algorithm and Genetic Algorithm

Using a genetic algorithm for editing k-nearest neighbor classifiers

Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest

Neural Network Weight Selection Using Genetic Algorithms

A Wrapper-Based Feature Selection for Analysis of Large Data Sets

CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

JHPCSN: Volume 4, Number 1, 2012, pp. 1-7

Genetic Programming. Charles Chilaka. Department of Computational Science Memorial University of Newfoundland

Selection of Location, Frequency and Orientation Parameters of 2D Gabor Wavelets for Face Recognition

A NOVEL APPROACH FOR PRIORTIZATION OF OPTIMIZED TEST CASES

Heuristic Optimisation

A Parallel Evolutionary Algorithm for Discovery of Decision Rules

MODULE 6 Different Approaches to Feature Selection LESSON 10

CS570: Introduction to Data Mining

Chapter 12 Feature Selection

Midterm Examination CS540-2: Introduction to Artificial Intelligence

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Design of an Optimal Nearest Neighbor Classifier Using an Intelligent Genetic Algorithm

FEATURE SELECTION TECHNIQUES

Classification Using Genetic Programming. Patrick Kellogg General Assembly Data Science Course (8/23/15-11/12/15)

Genetic Algorithms Variations and Implementation Issues

GENETIC ALGORITHM VERSUS PARTICLE SWARM OPTIMIZATION IN N-QUEEN PROBLEM

Exploring Econometric Model Selection Using Sensitivity Analysis

Domain Independent Prediction with Evolutionary Nearest Neighbors.

Midterm Examination CS 540-2: Introduction to Artificial Intelligence

Improving Feature Selection Techniques for Machine Learning

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University

4/22/2014. Genetic Algorithms. Diwakar Yagyasen Department of Computer Science BBDNITM. Introduction

Optimization of Association Rule Mining through Genetic Algorithm

Intro to Artificial Intelligence

Using Genetic Algorithms to Improve Pattern Classification Performance

Evolving SQL Queries for Data Mining

ET-based Test Data Generation for Multiple-path Testing

GENETIC ALGORITHM with Hands-On exercise

Simultaneous Feature Extraction and Selection Using a Masking Genetic Algorithm

Statistical Pattern Recognition

Slides for Data Mining by I. H. Witten and E. Frank

METAHEURISTICS Genetic Algorithm

Hardware Neuronale Netzwerke - Lernen durch künstliche Evolution (?)

Genetic-PSO Fuzzy Data Mining With Divide and Conquer Strategy

1. Introduction. 2. Motivation and Problem Definition. Volume 8 Issue 2, February Susmita Mohapatra

Unsupervised Feature Selection Using Multi-Objective Genetic Algorithms for Handwritten Word Recognition

Using Genetic Algorithms to Solve the Box Stacking Problem

Comparison of TSP Algorithms

Evolutionary Algorithm for Embedded System Topology Optimization. Supervisor: Prof. Dr. Martin Radetzki Author: Haowei Wang

INF 4300 Classification III Anne Solberg The agenda today:

Statistical Pattern Recognition

CHAPTER 5 ENERGY MANAGEMENT USING FUZZY GENETIC APPROACH IN WSN

Machine Learning: Algorithms and Applications Mockup Examination

Classifier Inspired Scaling for Training Set Selection

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

Literature Review On Implementing Binary Knapsack problem

CS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods


A GENETIC ALGORITHM APPROACH TO OPTIMAL TOPOLOGICAL DESIGN OF ALL TERMINAL NETWORKS

Escaping Local Optima: Genetic Algorithm

Grid-Based Genetic Algorithm Approach to Colour Image Segmentation

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

CHAPTER 4 FEATURE SELECTION USING GENETIC ALGORITHM

Classification. 1 o Semestre 2007/2008

A Parallel Architecture for the Generalized Traveling Salesman Problem

Classification. Instructor: Wei Ding

Evolutionary Algorithms. CS Evolutionary Algorithms 1

AN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS

Solving Sudoku Puzzles with Node Based Coincidence Algorithm

Multi-label classification using rule-based classifier systems

Introduction to Genetic Algorithms

Crew Scheduling Problem: A Column Generation Approach Improved by a Genetic Algorithm. Santos and Mateus (2007)

Multi-objective Optimization

DETERMINING MAXIMUM/MINIMUM VALUES FOR TWO- DIMENTIONAL MATHMATICLE FUNCTIONS USING RANDOM CREOSSOVER TECHNIQUES

REAL-CODED GENETIC ALGORITHMS CONSTRAINED OPTIMIZATION. Nedim TUTKUN

From dynamic classifier selection to dynamic ensemble selection Albert H.R. Ko, Robert Sabourin, Alceu Souza Britto, Jr.

Introduction to Genetic Algorithms. Based on Chapter 10 of Marsland Chapter 9 of Mitchell

Data mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20

A Classifier with the Function-based Decision Tree

Using Genetic Algorithms for Model-Based Object Recognition

Design of Nearest Neighbor Classifiers Using an Intelligent Multi-objective Evolutionary Algorithm

Machine Learning Techniques for Data Mining

Introduction to Evolutionary Computation

CSE 573: Artificial Intelligence Autumn 2010

Sımultaneous estımatıon of Aquifer Parameters and Parameter Zonations using Genetic Algorithm

Genetic Algorithms. Kang Zheng Karl Schober

V.Petridis, S. Kazarlis and A. Papaikonomou

Statistical Pattern Recognition

Lecture 6: Genetic Algorithm. An Introduction to Meta-Heuristics, Produced by Qiangfu Zhao (Since 2012), All rights reserved

Heuristic Optimisation

Genetic Algorithms with Oracle for the Traveling Salesman Problem

The movement of the dimmer firefly i towards the brighter firefly j in terms of the dimmer one s updated location is determined by the following equat

PARALLEL CLASSIFICATION ALGORITHMS

Data Mining Chapter 8: Search and Optimization Methods Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Basic Data Mining Technique

Comparative Study of Data Mining Classification Techniques over Soybean Disease by Implementing PCA-GA

International Journal of Digital Application & Contemporary research Website: (Volume 1, Issue 7, February 2013)

Metaheuristic Development Methodology. Fall 2009 Instructor: Dr. Masoud Yaghini

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

CS 188: Artificial Intelligence Fall 2008

Hybrid genetic algorithm for dual selection

Transcription:

Genetic Algorithms for Classification and Feature Extraction Min Pei, Erik D. Goodman, William F. Punch III and Ying Ding, (1995), Genetic Algorithms For Classification and Feature Extraction, Michigan State University, Genetic Algorithms Research and Applications Group (GARAGe),CSNA-95. Group 4 Hari Prasath Raman - 110283168 Venkatakrishnan Rajagopalan - 110455765

What is GA? "Select The Best, Discard The Rest" Randomized heuristic search/optimization strategy Simulate natural selection Initial population is composed of candidate solutions. Evolve population from which strong and diverse candidates can emerge via mutation and crossover (mating).

Basic Components Encoding representing the solution in the form of a string Successor function(s) Mutation, crossover Fitness function Some parameters Population size Generation limit

A different approach... Data mining techniques we learnt till now are based on modern database based languages Due to high dimensionality of the data, a numerical approach is needed The approach used here is 'statistical' Data Mining applies to many kinds of data and the type of data determines the language to be used Data used here is always a set of SPECIAL tuples a data vector is similar to record in this unified table It does no apply to any set of data (collection of different tuples) We represent attributes as features (just a different term)

Pattern Recognition and Classification System

Curse of Dimensionality Extra information often leads to "non-optimal" use of data Removal of redundant and irrelevant features, increases classifier reliability But use of too few features or of non representative features can make the classification difficult

Feature(Attribute) Selection Task of finding the "best" subset of size d of features from the initial N features in the data pattern space Criterion used to define the best subset is usually the probability of misclassification

Feature (Attribute) Extraction Transformation from pattern space to feature space such that the new feature set gives better separation of pattern classes and reduces dimensionalty Feature Extraction is superset of feature selection (identity transformation of feature extraction)

Formal Definition of Feature Selection Define the set of N features representing the pattern in the pattern space as a vector x x = [x 1, x 2,, x N ] Goal is to find the best subset of features y that satisfies some optimal performance assigned by criterion function J(.), ideally the probability of misclassification y = [y 1, y 2,, y n ] min J(y) = min {( y ) J(y)}

Formal Definition of Feature Extraction Goal is to y ield a new feature vector of lower dimension by defining a mapping function M( ) such that y = M(x) which satisfies some optimal performance assigned The by criterion result of function applying J(.) M is to create y such that y x y = M(x) min J{M(x)} = min {( M(x)) J(M(x))} In general, M(x) can be linear or non linear but majority of existing methods restrict to linear mapping y = W x, where W is a (NxN) matrix of coefficients

Classification & Feature Evaluation Feature selection and extraction are crucial in optimizing performance and strongly affect classifier design. Inherent relation: Feedback linkage between feature evaluation and classification Feature extraction and classifier design carried out simultaneously through "Genetic learning and evolution"

Feature Extractor and Classifier with Feedback Learning System

Benefits of GA GAs search from population not a single point. Discover new solutions by speculating on many combination of best possible solutions from within current pop. Useful in multi class high dimensionality which guarantees performance. A global optimum search method.

Approach GA/KNN hybrid approach GA/RULE approach

*Slide from Prof. Leman Akoglu lectures

*Slide from Prof. Leman Akoglu lectures

*Slide from Prof. Leman Akoglu lectures

*Slide from Prof. Leman Akoglu lectures

*Slide from Prof. Leman Akoglu lectures

*Slide from Prof. Leman Akoglu lectures

*Slide from Prof. Leman Akoglu lectures

GA/KNN Approach

Non Linear Transformation weight components that move towards 0 indicate that their corresponding features are not important for the discrimination task and are dropped in feature extraction resulting weights indicate the usefulness of a particular feature, its discriminatory power

Problems in Running GA Chromosome Encoding For feature selection, w i = {0,1} 1<= i <= N, the chromosome encoding requires a single bit for each w i, a component of weight vector w* For feature extraction w i = [0,10] 1<= i <= N, weight's resolution is determined by number of bits used Normalization of Training Datasets KNN evaluation is affected by scaling so we need to prenormalize the training data to some range such as [0,1] Evaluation Criteria -- Fitness functions

Fitness Function Classifier is defined as a function from pattern space to feature space then to classification space nmin is the cardinality of the near-neighbor minority set and K is the number of nearest neighbors. Constants gamma and delta are used for tuning the algorithm

Need for Speed GA can be implemented a parallel algorithm easily Most of the computational time is spent in evaluating the chromosome Idea is to distribute the evaluation of individuals in the population to several nodes (processors)

GA/KNN is not good enough Computational cost of the GA/KNN method is very high and requires parallel or distributed processing The very high computational cost comes from the problem of feature selection and extraction of high dimensionality data patterns Decrease computational cost without sacrificing performance by directly generating rules i.e, using a GA combined with a production (rule) system

GA/RULE Approach GA combined with a production rule system Focuses on the classification of binary feature patterns A single, integrated rule format uses a known training sample set result is a small set of best classification rules Directly manipulates a rule representation used for classification rather than transformation on KNN rule

Advantages Simpler to implement Requires substantially fewer computation cycles to achieve answers of similar quality. Accuracy of this method is significantly better than the classical KNN method good rules created for classifying unknown data Reveal those features which are important for the classification, based on the features used by the rules.

Classification rule format <classification_rule> : : = <condition> : <message> A classification rule is a production rule which is used to make a decision assigning a pattern x to one of many classes The <condition> part of the rule is a string which consists of k class-attribute vectors Each vector consists of "n" elements, where "n" is the number of attributes (features) being used for the classification of that class Class-attribute vectors determine features to be used in this rule s decision for classification i.e act as classification predicate. The <message> indicates the class into which the rule classifier places an input pattern matching against the feature vector

Classification Rule Format <classification_rule> : : = <condition> : <message> ( Y 11,..., Y 1i,..., Y 1n),...,( Y j1,..., Y ji,..., Y jn),...,( Y k1,..., Y ki,... Y kn) : ω where i = 1, 2,, n; j = 1, 2,, k ω is a variable whose value can be one of the 'k' classes. The alphabet of the class-feature vector consists of 0, 1 and a "don t care" character, i.e., Y ji {0, 1, #}

Training and Evaluation using GA The training data consists of a training vector(record) X with a known classification label. X = [X 1, X 2,, X i,, X n] where i = 1, 2,, n and X i {0,1} Each rule is evaluated by matching it against the training data set. Every class-feature vector of the rule s condition is compared with the training vector at every position 0 matches a 0, a 1 matches 1 and # don t-care matches either 0 or 1

Training and Evaluation using GA For a training set with three classes, the training vector would be compared with the three vectors in each rule. The number of matching features in each vector is counted and the vector with the highest number of matches determines the class of the sample. Since the class of each training sample is already known, this classification can then be judged correct or not. Based on the accuracy of classification, the decision rule can be directly rewarded or punished. Based on this rule strength, the GA can evolve new rules.

GA/RULE Approach

Genetic Operators Crossover Standard one-point crossover Mutation Standard bit-modification The entire population (except for the best solution) is replaced each generation Fitness Function Fitness = CorrectPats/TotPats + α n_don tcare/totpats This fitness function guides the GA to search out best rule sets as a multiobjective optimization problem.

Determining Invalid Features For each rule, we arrange the class vectors as a matrix, then determine whether every class vector has the same value (1 or 0) or a don tcare (#) at any position (column). If they do, the n_don tcare variable is incremented, as this feature is useless for classification

Summary GA plays an important role in classification and feature extraction for high dimensionality and multiclass data patterns An automatic pattern recognition method utilizing feedback information from classifier being used to change the decision space GA/KNN works by transforming the decision space and reducing its dimensionality GA/RULE works by modifying the decision rule using inductive learning and evolution GA can be used with other classifier to solve other complex pattern recognition problems