Neural Network Weight Selection Using Genetic Algorithms

Similar documents
Neural Network Weight Selection Using Genetic Algorithms

Genetic Algorithms Variations and Implementation Issues

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

Lecture 6: Genetic Algorithm. An Introduction to Meta-Heuristics, Produced by Qiangfu Zhao (Since 2012), All rights reserved

V.Petridis, S. Kazarlis and A. Papaikonomou

LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS

CS5401 FS2015 Exam 1 Key

Escaping Local Optima: Genetic Algorithm

Review: Final Exam CPSC Artificial Intelligence Michael M. Richter

A Steady-State Genetic Algorithm for Traveling Salesman Problem with Pickup and Delivery

CHAPTER 5 ENERGY MANAGEMENT USING FUZZY GENETIC APPROACH IN WSN

Mutations for Permutations

Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest

Genetic Programming. Charles Chilaka. Department of Computational Science Memorial University of Newfoundland

DETERMINING MAXIMUM/MINIMUM VALUES FOR TWO- DIMENTIONAL MATHMATICLE FUNCTIONS USING RANDOM CREOSSOVER TECHNIQUES

The k-means Algorithm and Genetic Algorithm

1. Introduction. 2. Motivation and Problem Definition. Volume 8 Issue 2, February Susmita Mohapatra

Introduction to Evolutionary Computation

The exam is closed book, closed notes except your one-page cheat sheet.

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Machine Learning Classifiers and Boosting

CS6220: DATA MINING TECHNIQUES

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Monika Maharishi Dayanand University Rohtak

Opening the Black Box Data Driven Visualizaion of Neural N

Adaptive Crossover in Genetic Algorithms Using Statistics Mechanism

Machine Learning 13. week

Genetic Algorithms. Kang Zheng Karl Schober

An Evolutionary Approximation to Contrastive Divergence in Convolutional Restricted Boltzmann Machines

Neural Network Neurons

Machine Evolution. Machine Evolution. Let s look at. Machine Evolution. Machine Evolution. Machine Evolution. Machine Evolution

Outline. Motivation. Introduction of GAs. Genetic Algorithm 9/7/2017. Motivation Genetic algorithms An illustrative example Hypothesis space search

Chapter 9: Genetic Algorithms

Hardware Neuronale Netzwerke - Lernen durch künstliche Evolution (?)

METAHEURISTICS Genetic Algorithm

Preliminary Background Tabu Search Genetic Algorithm

Suppose you have a problem You don t know how to solve it What can you do? Can you use a computer to somehow find a solution for you?

Evolving SQL Queries for Data Mining

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

Introduction to Design Optimization: Search Methods

Nuclear Research Reactors Accidents Diagnosis Using Genetic Algorithm/Artificial Neural Networks

Artificial Intelligence Application (Genetic Algorithm)

Evolutionary Computation. Chao Lan

Computational Intelligence

A TSK-Type Recurrent Fuzzy Network for Dynamic Systems Processing by Neural Network and Genetic Algorithms

Information Fusion Dr. B. K. Panigrahi

ARTIFICIAL INTELLIGENCE (CSCU9YE ) LECTURE 5: EVOLUTIONARY ALGORITHMS

CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES

Segmentation of Noisy Binary Images Containing Circular and Elliptical Objects using Genetic Algorithms

A Genetic Algorithm for Graph Matching using Graph Node Characteristics 1 2

Uninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall

Simulation of Zhang Suen Algorithm using Feed- Forward Neural Networks

What is GOSET? GOSET stands for Genetic Optimization System Engineering Tool

Machine Learning in Biology

Using Genetic Algorithms to Solve the Box Stacking Problem

COMPUTATIONAL INTELLIGENCE

CSC 578 Neural Networks and Deep Learning

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011

Meta- Heuristic based Optimization Algorithms: A Comparative Study of Genetic Algorithm and Particle Swarm Optimization

The Genetic Algorithm for finding the maxima of single-variable functions

GENETIC ALGORITHM VERSUS PARTICLE SWARM OPTIMIZATION IN N-QUEEN PROBLEM

Combined Weak Classifiers

BP Neural Network Based On Genetic Algorithm Applied In Text Classification

Genetic Algorithms for Vision and Pattern Recognition

IN recent years, neural networks have attracted considerable attention

Chapter 7: Competitive learning, clustering, and self-organizing maps

Similarity Templates or Schemata. CS 571 Evolutionary Computation

CHAPTER 2 CONVENTIONAL AND NON-CONVENTIONAL TECHNIQUES TO SOLVE ORPD PROBLEM

Function approximation using RBF network. 10 basis functions and 25 data points.

CHAPTER 4 FEATURE SELECTION USING GENETIC ALGORITHM

Akarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction

Artificial Intelligence

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Metaheuristic Development Methodology. Fall 2009 Instructor: Dr. Masoud Yaghini

A genetic algorithms approach to optimization parameter space of Geant-V prototype

Multi-Objective Optimization Using Genetic Algorithms

Introduction to Genetic Algorithms

Traffic Signal Control Based On Fuzzy Artificial Neural Networks With Particle Swarm Optimization

Unsupervised Feature Selection Using Multi-Objective Genetic Algorithms for Handwritten Word Recognition

METAHEURISTIC. Jacques A. Ferland Department of Informatique and Recherche Opérationnelle Université de Montréal.

Sparse Matrices Reordering using Evolutionary Algorithms: A Seeded Approach

Using a genetic algorithm for editing k-nearest neighbor classifiers

A Skew-tolerant Strategy and Confidence Measure for k-nn Classification of Online Handwritten Characters

Improving Performance of Multi-objective Genetic for Function Approximation through island specialisation

APPLICATIONS OF INTELLIGENT HYBRID SYSTEMS IN MATLAB

Improving interpretability in approximative fuzzy models via multi-objective evolutionary algorithms.

Chapter 14 Global Search Algorithms

A Framework for adaptive focused web crawling and information retrieval using genetic algorithms

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves

Planning and Search. Genetic algorithms. Genetic algorithms 1

Fall 09, Homework 5

A New Multi-objective Multi-mode Model for Optimizing EPC Projects in Oil and Gas Industry

[Premalatha, 4(5): May, 2015] ISSN: (I2OR), Publication Impact Factor: (ISRA), Journal Impact Factor: 2.114

ATI Material Do Not Duplicate ATI Material. www. ATIcourses.com. www. ATIcourses.com

Introduction to Genetic Algorithms. Based on Chapter 10 of Marsland Chapter 9 of Mitchell

Evolutionary Algorithms. CS Evolutionary Algorithms 1

Genetic Algorithms for Classification and Feature Extraction

EVOLVING LEGO. Exploring the impact of alternative encodings on the performance of evolutionary algorithms. 1. Introduction

Overlapping Swarm Intelligence for Training Artificial Neural Networks

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

Transcription:

Neural Network Weight Selection Using Genetic Algorithms David Montana presented by: Carl Fink, Hongyi Chen, Jack Cheng, Xinglong Li, Bruce Lin, Chongjie Zhang April 12, 2005 1

Neural Networks Neural networks generally consist of five components. A directed graph known as the network topology whose nodes represent the neurodes (or processing elements) and whose arcs represent the connections. A state variable associated with each neurode. A real-valued weight associated with each connection. A real-valued bias associated with each neurode. A transfer function f for each neurode such that the state of the neurode is f(ω i x i β). 2

Genetic Algorithms Genetic algorithms require five components. A way of encoding solutions to the problem on chromosomes. An evaluation function which returns a rating for each chromosome given to it. A way of initializing the population of chromosomes. Operators that may be applied to parents when they reproduce to alter their genetic composition. Standard operators are mutation and crossover. Parameter settings for the algorithm, the operators, and so forth. 3

Genetic Algorithms (19.3, 0.05, 1.2, 345, 2.0, 7.7, 68.0) Mutation (19.3, 0.05, 1.2, 345, 2.0, 8.2, 68.0) Figure 1: Mutation Operation (19.3, 0.05, 1.2, 345, 2.0, 7.7, 68.0), (17.6, 0.05, 1.2, 250, 3.0, 7.7, 70.0) Crossover (17.6, 0.05, 1.2, 345, 3.0, 7.7, 68.0) Figure 2: Crossover Operation 4

Genetic Algorithms Given these five components, a genetic algorithm operates according to the following steps. Initialize the population using the initialization procedure, and evaluate each member of the initial population. Reproduce until a stopping criterion is met. 5

Genetic Algorithms Reproduction consists of iterations of the following steps. Choose one or more parents to reproduce. Selection is stochastic, but the individuals with the highest evaluations are favored in the selection. Choose a genetic operator and apply it to the parents. Evaluate the children and accumulate them into a generation. After accumulating enough individuals, insert them into the population, replacing the worst current members of the population. 6

Pattern Classification Problem A Problem with k features and M classes: Given a set of training examples, selects the most likely class for any instance not in the training set. An instance is represented by a k-dimensional feature vector. An example is represented by a instance and the class that the instance belongs to. 7

Real Examples Handwritten character recognition Speech recognition Blood cell classification... 8

Neural Networks Sigmoid Feed-forward Neural Network Weighted Probabilistic Neural Networks (WPNN) 9

A feed-forward network Sigmoid Feed-forward Networks A directed graph without cycles. A multi-layer network Neurones in each layer (except output layer) are completely connected to the forward layer. Each neuron in the network is a sigmoid unit 10

Sigmoid Network Topology Figure 3: Typical Sigmoid Network 11

Sigmoid Unit σ(x) is the function: 1 1+e x 12

Application for Pattern Classification 13

Application for Pattern Classification (Cont.) Representation: Input representation: < x 1, x 2,..., x k > Class representation: < 1, 0,..., 0 > for class 1 Evaluation Function The sum of squared errors Training Algorithms Backpropagation Algorithm Genetic Algorithm 14

Weighted Probabilistic Neural Network (WPNN) WPNN is a pattern classification algorithm which falls into the broad class of nearest-neighbor-like algorithms. Likelihood Function: distance probability Note: Distance is between an instance to all examples of a particular class, not just one example. 15

WPNN Likelihood Function Let the exemplars from class i be the k-vector x i j for j = 1,..., N i. Then the likelihood function for class i is L i ( x) = 1 N i (2π) k/2 (detσ) 1/2 N i j=1 e ( x xi j )T Σ 1 ( x x i j ) L i ( x) describes the probability that the instance x is of class i. Note: the class likelihood functions are sums of identical anisotropic Gaussians centered at the examples divided by N i. 16

WPNN Conditional Probability The value of the likelihood function for a particular instance may fail to classify the instance. For example, L 1 ( x) = 0.3, L j ( x) = 0.01 for j = 2, 3,..., M. The conditional probability for class i: P i ( x) = L i ( x)/ M L j ( x) j=1 17

WPNN Feature Weights Training WPNN consists of selecting the entries of matrix Σ. Matrix Σ is restricted to be a diagonal matrix. The inverse of each entry in Σ is a weight on the corresponding feature. 18

WPNN Implementation It is called a neural network because its implementation is natural mapped onto a two-layer feed-forward network: k neurons in the input layer M neurons with standard normal Gaussian transfer functions in the output layer kn neurons in the single hidden layers, each with a linear function ax b. k: number of features, M: number of classes N: number of total examples 19

WPNN Implementation (Cont.) The ((i 1)k + j)th hidden neuron (i < N, j < k): 1. is connected to the jth input neuron with weight a = 1.0 2. has bias b equal to the value of jth feature of the ith example 3. is connected to the nth output neuron, where n is the class of the ith exemplar, with weight w j, where w j is the selected feature weight for the jth feature. 20

An Example Two classes, two features, and three examples: No. Instance Class 1 < 1, 2 > 1 2 < 2, 5 > 2 3 < 3, 4 > 1 21

An Example (Cont.) 1 N i (2π) 2k 1 ω1,...,ω k ( Ni j=1 e k ) ω nx 2 n=1 jn 2 Classes ω 1 ω 2 ω 1 ω 2 ω 1 ω 2 k N = 2 3 = 6 x 1 1 x 2 2 x 1 2 x2 5 x 1 3 x 2 4 2 Features 22

Survey of Hybrid Systems Supportive combinations Supportive combinations typically involve using one of these methods to prepare data for consumption by the other. For example, using a genetic algorithm to select features for use by neural network classifiers. Collaborative combinations Collaborative combinations typically involve using the genetic algorithm to determine the neural network weights or the topology or learning algorithm. 23

Supportive Combinations Using NN to assist GA Using GA to assist NN Data preparation Evolving network parameters and learning rules Using GA to explain and analyze NN Using GA and NN independently 24

Collaborative Combinations GA to select weight Two basic differences between different approaches: architectures (feedforward sigmoidal, WPNN, cascade-correlation, recurrent sigmoidal, recurrent linear threshold, feedforward with step functions and feedback with step functions) and difference in GA itself. 25

Collaborative Combinations GA to specify NN topology A genotype representation must be devised and an attendant mapping from genotype to phenotype must be provided. There must be a protocol for exposing the phenotype to the task environment. There must be a learning method to fine tune the network function. There must be a fitness measure. There must be method for generating new genotypes. GA to learn the NN learning algorithm 26

Reasons to apply GA to NN Finding the global optima For recurrent networks For networks with discontinuous functions GA can optimize any combination of weights, biases, topology and transfer functions The ability to use arbitrary evaluation function 27

Excessive computing time Solutions: Disadvantage and Solutions Using specialized NN hardware Using the best local learning algorithm Using parallel implementations of GA By finding the best division of labor between the local and evolutionary learning paradigms to make the best possible use of training time 28

GA Formulation for Training Sigmoid FFNN Problem Formulation Initialization Genetic Operators Fitness Evaluation Genetic Parameters 29

Problem Formulation Individuals are the NN s themselves; no string encoding Topology is fixed, weights are evolved 30

Selection of initial weights Population Initialization Plot of e x 31

Sum of Squared Error E( ω) 1/2 d D Fitness Evaluation (t kd o kd ) 2 k outputs 32

Genetic Operators - Sigmoid FFNN Unbiased-Mutate-Weights - Select from probability distribution Biased-Mutate-Weights - Add in previous weight Mutate-Nodes - Mutatation grouped by node (schema preservation) Crossover-Weights - Uniform-crossover vs. point-crossover Crossover-Nodes - Crossover grouped by node (schema preservation) Crossover-Features - Competing conventions Mutate-Weakest-Nodes - Non-random selection of least contributive Hillclimb - One step in direction of gradient 33

Population-Size: 50 Generation-Size: 1 Parent-Scalar: [0.89, 0.93] Parameter Values 34

Experimental Results Feedforward Sigmoidal First, Comparing the three mutations resulted in a clear order 1. MUTATE-NODES 2. BIASED-MUTATE-WEIGHTS 3. UNBIASED-MUTATE-WEIGHTS 35

Experimental Results Feedforward Sigmoidal 36

Experimental Results Feedforward Sigmoidal (Cont.) Second, a comparing of the three crossovers produced no clear winner. 37

Experimental Results Feedforward Sigmoidal (Cont.) Third, when MUTATE-WEAKEST-NODE was added to a mutation and crossover operator, it improved performance only at the beginning. The performance decreased after a certain small amount of time. 38

Experimental Results Feedforward Sigmoidal (Cont.) Finally, a generic training algorithm with operators MUTATE-NODES and CROSSOVER-NODES outperformed backpropagation on our problem. 39

Experimental Results Feedforward Sigmoidal (Cont.) Backpropagation Algorithm is more computationally efficient than Genetic Algorithm. 40

Five Components of the genetic algorithm with WPNN: Representation a logarithmic representation for large dynamic range with proportional resolution; Map: n B n k 0 41

2. Evaluation Function leaving-one-out technique: a special form of cross-validation Performance function: E = M i=1 N i j=1 { [1 P i ( x i j) ] 2 [ + P q ( x i j) ] } 2 q i 42

Leave-One-Out Method For k = 1, 2,..., k Err(k) = 0 43

Leave-One-Out Method For k = 1, 2,..., k Err(k) = 0 1. Randomly select a training data point and hide its class label 44

Leave-One-Out Method For k = 1, 2,..., k Err(k) = 0 1. Randomly select a training data point and hide its class label 2. Using the remaining data and given k to predict the class label for the left data point 45

Leave-One-Out Method For k = 1, 2,..., k Err(k) = 0 1. Randomly select a training data point and hide its class label 2. Using the remaining data and given k to predict the class label for the left data point 3. Err(k) = Err(k) + 1 if the predicted label is different from the true label 46

Leave-One-Out Method For k = 1, 2,..., k Err(k) = 0 1. Randomly select a training data point and hide its class label 2. Using the remaining data and given k to predict the class label for the left data point 3. Err(k) = Err(k) + 1 if the predicted label is different from the true label Repeat the procedure until all training examples are tested. 47

Leave-One-Out Method For k = 1, 2,..., k Err(k) = 0 1. Randomly select a training data point and hide its class label 2. Using the remaining data and given k to predict the class label for the left data point 3. Err(k) = Err(k) + 1 if the predicted label is different from the true label Repeat the procedure until all training examples are tested. Choose the k whose Err(k) is minimal 48

Empirical observations and theoretical arguments show WPNN works best when only a small fraction of the exemplars contribute significantly. So we reject a particular, for any exemplar x i j, if Where P = 4 e ( x x i j )T Σ 1 ( x x i j ) > x x j ( M N i )/P i=1 49

3. Initialization Procedure WPNN: integers chosen randomly in [1, K]. Where K depends on the desired range and resolution for weight 50

4. Genetic Operators WPNN: standard GA mutation and uniform crossover Uniform crossover example: Individual A B Offspring Genotype a b c d e f q w e r t y q b e d t f The gene at locus j, where 0 j < string length, from both parents have equal selected for the new offspring. Why uniform? A real-valued representation is used and no particular ordering to the feature weights. 51

Population-Size: 1600 Generation-Size: 5. Parameter Values 1. steady-state approach. With the large population a steady-state GA is used, which means Generation-Size is small relative to Population-Size. 2. for using a single CPU, Generation-size = 1. Parent-Scalar: the smaller the Parent-Scalar, the faster the converge. Parent-Scalar = 0.9 52

Sample run of a steady-state GA 1. Initial population(randomly generated) (a) (1001010) eval = 3 (b) (0100110) eval = 3 (c) (1101011) eval = 5 (d) (0110101) eval = 4 2. New children, from crossover or 3rd and 4th: (a) (1101101) eval = 5 (b) (0110011) eval = 4 53

Experimental Results WPNN WPNN was a new and untested algorithm, the experiments with WPNN centered on the overall performance rather than on the training algorithm. 54

Experimental Results WPNN (Cont.) 4 data set designed are used to illustrate both the advantages and shortcomings of WPNN. 55

Experimental Results WPNN (Cont.) First Data Set: 1. It is a training set that is generated during an effort to classify simulated sonar signals. 2. 10 features 3. 5 classes 4. 516 total exemplars 56

Experimental Results WPNN (Cont.) Second Data Set: Same as first data set except: 5 more features are added (which were random numbers uniformly distributed between 0 and 1 hence contained no information relevant to classification) 57

Experimental Results WPNN (Cont.) Third Data Set: Same as first data set except: 10 irrelevent features are added. (total of 20 features) 58

Experimental Results WPNN (Cont.) Fourth Data Set: Has 20 features just like the third data set. Pair each of the true feature with one of the irrelevant features. Mixing up the relevant features with the irrelevant features with via linear combinations. 0.5(f i + g i ) and 0.5(f i g i + 1) 59

Experimental Results WPNN (Cont.) Dataset 1 2 3 4 Backpropagation 11 16 20 13 PNN 9 94 109 29 WPNN 10 11 11 25 60