APPLICATION OF THE FUZZY MIN-MAX NEURAL NETWORK CLASSIFIER TO PROBLEMS WITH CONTINUOUS AND DISCRETE ATTRIBUTES

Similar documents
Fig. 1): The rule creation algorithm creates an initial fuzzy partitioning for each variable. This is given by a xed number of equally distributed tri

The task of inductive learning from examples is to nd an approximate definition

Unit V. Neural Fuzzy System

Extensive research has been conducted, aimed at developing

Networks for Control. California Institute of Technology. Pasadena, CA Abstract

Skill. Robot/ Controller

Classifier C-Net. 2D Projected Images of 3D Objects. 2D Projected Images of 3D Objects. Model I. Model II

Analysis of Decision Boundaries Generated by Constructive Neural Network Learning Algorithms

Implementations of Dijkstra's Algorithm. Based on Multi-Level Buckets. November Abstract

size, runs an existing induction algorithm on the rst subset to obtain a rst set of rules, and then processes each of the remaining data subsets at a

A taxonomy of race. D. P. Helmbold, C. E. McDowell. September 28, University of California, Santa Cruz. Santa Cruz, CA

Performance Analysis of Data Mining Classification Techniques

Gen := 0. Create Initial Random Population. Termination Criterion Satisfied? Yes. Evaluate fitness of each individual in population.

Lab 2: Support Vector Machines

A Modified Fuzzy Min-Max Neural Network and Its Application to Fault Classification

BRACE: A Paradigm For the Discretization of Continuously Valued Data

A Modular Reduction Method for k-nn Algorithm with Self-recombination Learning

Adaptive Resolution Min-Max Classifiers

Introduction The problem of cancer classication has clear implications on cancer treatment. Additionally, the advent of DNA microarrays introduces a w

Chapter 4 Fuzzy Logic

NEFCLASS A NEURO{FUZZY APPROACH FOR THE CLASSIFICATION OF DATA. Detlef Nauck and Rudolf Kruse. The fuzzy rules describing the data are of the form:

A Population-Based Learning Algorithm Which Learns Both. Architectures and Weights of Neural Networks y. Yong Liu and Xin Yao

A Parallel Evolutionary Algorithm for Discovery of Decision Rules

An On-line Variable Length Binary. Institute for Systems Research and. Institute for Advanced Computer Studies. University of Maryland

Application of genetic algorithms and Kohonen networks to cluster analysis

MISSING VALUES AND LEARNING OF FUZZY RULES MICHAEL R. BERTHOLD. Computer Science Division, Department of EECS

2 Data Reduction Techniques The granularity of reducible information is one of the main criteria for classifying the reduction techniques. While the t

Shared Kernel Models for Class Conditional Density Estimation

Clustering of Data with Mixed Attributes based on Unified Similarity Metric

Availability of Coding Based Replication Schemes. Gagan Agrawal. University of Maryland. College Park, MD 20742

The Potential of Prototype Styles of Generalization. D. Randall Wilson Tony R. Martinez

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University

Feature Minimization within Decision Trees. Erin J. Bredensteiner. Kristin P. Bennett. R.P.I. Math Report No November 1995.

Network. Department of Statistics. University of California, Berkeley. January, Abstract

Combining Multiple OCRs for Optimizing Word

Prewitt. Gradient. Image. Op. Merging of Small Regions. Curve Approximation. and

Machine Learning. Unsupervised Learning. Manfred Huber

Decision Making. final results. Input. Update Utility

1 Introduction Data format converters (DFCs) are used to permute the data from one format to another in signal processing and image processing applica

Modified Fuzzy Hyperline Segment Neural Network for Pattern Classification and Recognition

However, m pq is just an approximation of M pq. As it was pointed out by Lin [2], more precise approximation can be obtained by exact integration of t

Clustering Sequences with Hidden. Markov Models. Padhraic Smyth CA Abstract

Using Decision Boundary to Analyze Classifiers

Introduction to Machine Learning

Unsupervised Learning

Dr. Qadri Hamarsheh Supervised Learning in Neural Networks (Part 1) learning algorithm Δwkj wkj Theoretically practically

Some questions of consensus building using co-association

A Boolean Expression. Reachability Analysis or Bisimulation. Equation Solver. Boolean. equations.

Neural Network based textural labeling of images in multimedia applications

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Dierential-Linear Cryptanalysis of Serpent? Haifa 32000, Israel. Haifa 32000, Israel

Localization in Graphs. Richardson, TX Azriel Rosenfeld. Center for Automation Research. College Park, MD

Fuzzy Partitioning with FID3.1

Combined Weak Classifiers

Intersection of sets *

An Evaluation of Information Retrieval Accuracy. with Simulated OCR Output. K. Taghva z, and J. Borsack z. University of Massachusetts, Amherst

Generell Topologi. Richard Williamson. May 6, 2013

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a

Image Segmentation Region Number. Ping Guo and Michael R. Lyu. The Chinese University of Hong Kong, normal (Gaussian) mixture, has been used in a

2. CNeT Architecture and Learning 2.1. Architecture The Competitive Neural Tree has a structured architecture. A hierarchy of identical nodes form an

2 The original active contour algorithm presented in [] had some inherent computational problems in evaluating the energy function, which were subsequ

(a) (b) Figure 1: Bipartite digraph (a) and solution to its edge-connectivity incrementation problem (b). A directed line represents an edge that has

Supervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples.

Backpropagation Neural Networks. Ain Shams University. Queen's University. Abstract

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)

x L d +W/2 -W/2 W x R (d k+1/2, o k+1/2 ) d (x k, d k ) (x k-1, d k-1 ) (a) (b)

A System for Joining and Recognition of Broken Bangla Numerals for Indian Postal Automation

Center for Automation and Autonomous Complex Systems. Computer Science Department, Tulane University. New Orleans, LA June 5, 1991.

Basic Data Mining Technique

Efficient SQL-Querying Method for Data Mining in Large Data Bases

Reimplementation of the Random Forest Algorithm

Multi prototype fuzzy pattern matching for handwritten character recognition

A neural-networks associative classification method for association rule mining

Inner and outer approximation of capture basin using interval analysis

ICA. PCs. ICs. characters. E WD -1/2. feature. image. vector. feature. feature. selection. selection.

CSEP 573: Artificial Intelligence

Local consistency for ternary numeric constraints. Boi Faltings and Esther Gelle. Articial Intelligence Laboratory (LIA)

Fuzzy Rule-Based Networks for Control 1.0. Membership. Value. Neg. ability of this system is limited.

16 Greedy Algorithms

Advances in Neural Information Processing Systems, in press, 1996

The Relationship between Slices and Module Cohesion

ECE 5424: Introduction to Machine Learning

VHDL framework for modeling fuzzy automata

Classification with Diffuse or Incomplete Information

Allowing Cycle-Stealing Direct Memory Access I/O. Concurrent with Hard-Real-Time Programs

Path Consistency Revisited. Moninder Singh. University of Pennsylvania. Philadelphia, PA

Fuzzy Hamming Distance in a Content-Based Image Retrieval System

ICA as a preprocessing technique for classification

retrieve portions of these maps based on the information and then let the GIS handle the queries and discard the number of heterogeneous maps.

to be known. Let i be the leg lengths (the distance between A i and B i ), X a 6-dimensional vector dening the pose of the end-eector: the three rst c

Object Recognition Robust under Translation, Rotation and Scaling in Application of Image Retrieval

Fuzzy Ant Clustering by Centroid Positioning

CloNI: clustering of JN -interval discretization

Copyright (C) 1997, 1998 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for

AN IMPROVED K-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION

An Empirical Study of Software Metrics in Artificial Neural Networks

Cluster quality 15. Running time 0.7. Distance between estimated and true means Running time [s]

then present results of directly applying the SFFS feature selection algorithms for mammographic mass detection, discussing how noise and the sie of t

A Novel Technique for Optimizing the Hidden Layer Architecture in Artificial Neural Networks N. M. Wagarachchi 1, A. S.

the number of states must be set in advance, i.e. the structure of the model is not t to the data, but given a priori the algorithm converges to a loc

Transcription:

APPLICATION OF THE FUZZY MIN-MAX NEURAL NETWORK CLASSIFIER TO PROBLEMS WITH CONTINUOUS AND DISCRETE ATTRIBUTES A. Likas, K. Blekas and A. Stafylopatis National Technical University of Athens Department of Electrical and Computer Engineering Computer Science Division 157 73 Zographou, Athens, Greece Abstract. The fuzzy min-max classication network constitutes a promisimg pattern recognition approach that is based on hyberbox fuzzy sets and can be incrementally trained requiring only one pass through the training set. The denition and operation of the model considers only attributes assuming continuous values. Therefore, the application of the fuzzy min-max network to a problem with continous and discrete attributes, requires the modication of its denition and operation in order to deal with the discrete dimensions. Experimental results using the modied model on a dicult pattern recognition problem establishes the strengths and weaknesses of the proposed approach. INTRODUCTION Fuzzy min-max neural networks [2, 3] consitute one of the many models of computational intelligence that have been recently developed from research eorts aiming at synthesizing neural networks and fuzzy logic [1]. The fuzzy min-max classication neural network [2] is an on-line supervised learning classier that is based on hyperbox fuzzy sets. A hyperbox constitutes a region in the pattern space that can be completely dened once the minimum and the maximum points along each dimension are given. Each hyperbox is associated with exactly one from the pattern classes and all patterns that are contained within a given hyperbox are considered to have full class membership. In the case where a pattern is not completely contained in any of the hyperboxes, a properly 1

computed fuzzy membership function (taking values in [0; 1]) indicates the degree to which the pattern falls outside of each of the hyperboxes. During operation, the hyperbox with the maximum membership value is selected and the class associated with the winning hyperbox is considered as the desicion of the network. Learning in the fuzzy min-max classication network is an expansion-contraction process that consists of creating and adjusting hyperboxes (the minimum and maximum points along each dimension) and also associating a class label to each of them. In this work, we study the performance of the fuzzy min-max classication neural network on a pattern recognition problem that involves both discrete and continuous attributes. In order to handle the discrete attributes, the denition of a hyperbox must be modied to incorporate crisp (not fuzzy) sets in the discrete dimensions. Moreover, a modication is needed of the way the membership values are computed, along with changes in the criterion under which the hyperboxes are expanded. Besides extending the denition and operation of the fuzzy min-max network, the purpose of this work is also to gain insight into the factors that aect operation and training and test its classication capabilities on a dicult problem. In the following section a brief description of the operation and training of the fuzzy min-max classication network is provided, while in Section 3 the modied approach is presented. Section 4 provides experimental results from the application of the approach to a dicult classication problem. It also presents results from the comparison of the method with the backpropagation algorithm and summarizes the major advantages and drawbacks of the fuzzy min-max neural network when used as a pattern classier. LEARNING IN THE FUZZY MIN-MAX CLASSIFICATION NETWORK Consider a classication problem with n continuous attributes that have been rescaled in the interval [0; 1], hence the pattern space is I n ([0; 1] n ). Moreover, consider that there exist p classes and K hyperboxes with corresponding minimum and maximum values v ji and w ji respectively (j = 1;...; K, i = 1;...; n). Let also c k denote the class label associated with hyperbox B k. When the h th input pattern A h = (a h1;...; a hn ) is presented to the 2

network, the corresponding membership function for hyperbox B j is ([3]) nx b j (A h ) = 1 [1 f(a hi w ji ; ) f(v ji a hi ; )] (1) n i=1 where f(x; ) = x, if 0 x 1, f(x; ) = 1 if x > 1 and f(x; ) = 0 if x < 0. If the input pattern A h falls inside the hyperbox B j then b j (A h ) = 1, otherwise the membership decreases and the parameter 1 regulates the decrease rate. As already noted, the class of the hyperbox with the maximum membership is considered as the output of the network. In a neural network formulation, each hyperbox B j can be considered as a hidden unit of a feedforward neural network that receives the input pattern and computes the corresponding membership value. The values v ji and w ji can be considered as the weights from the input to the hidden layer. The output layer contains as many output nodes as the number of classes. The weights u jk (j = 1;...; K, k = 1;...; p) from the hidden to the output layer express the class corresponding to each hyperbox: u jk = 1 if B j is a hyperbox for class c k, otherwise it is zero. During learning, each training pattern A h is presented once to the network and the following process takes place: First we nd the hyperbox B j with the maximum membership value among those that correspond to the same class as pattern A h and meet the expansion criterion: nx n (max(w ji ; a hi ) min(v ji ; a hi )) (2) i=1 The parameter (0 1) is a user-dened value that imposes a bound on the size of a hyperbox and its value signicantly aects the eectiveness of the training algorithm. In the case where an expandable hyperbox (of the same class) cannot be found, then a new hyperbox B k is spawned and we set w ki = v ki = a hi for each i. Otherwise, the hyperbox B j with the maximum membership value is expanded in order to incorporate the new pattern A h, i.e., for each i = 1;...; n: v new ji = min(v old ji ; a hi) (3) w new ji = max(w old ji ; a hi) (4) Following the expansion of a hyperbox, an overlap test takes place to determine if any overlap exists between hyperboxes from dierent classes. In case such an overlap exists, it is eliminated by a contraction process during which the size of each of the overlapping hyperboxes is minimally 3

adjusted. Details concering the overlap test and the contraction process can be found in [2]. From the above description it is clear that the eectiveness of the training algorithm mainly depends on two factors: the value of the parameter and the order with which the training patterns are presented to the network. TREATING DISCRETE ATTRIBUTES A basic assumption concerning the application of the fuzzy min-max classication network to a pattern recognition problem is that all attributes take continuous values. Hence, it is possible to dene the pattern space (union of hyperboxes) corresponding to each class by providing the minimum and maximum attribute values along each dimension. In the case of pattern recognition problems that are based on both analog and discrete attributes, it is nessecary for the discrete features to be treated in a dierent way. This is mainly due to the fact that it is not possible to dene a meaningful ordering of the values of discrete attributes. Thus, it is not possible to apply the minimum and maximum operations on which the original fuzzy min-max neural network is based. Consider a pattern recognition problem with n attributes (both continous and discrete). Let D denote the set of the indices of the discrete attributes and C denote the set of indices of the continuous attributes. Let also n C = jcj and n D = jdj denote the number of continuous and discrete attributes respectively and D i denote the domain of each discrete feature i 2 D. A pattern A h = (a h1;...; a hn ) of this problem has the characteristic that a hi 2 [0; 1] for i 2 C and a hi 2 D i for i 2 D. In order to deal with problems characterized by such mixture of attributes, we consider that each hyperbox B j is described by providing the minimum v ji and maximum w ji attribute values for the case of continuous features (i 2 C) and by explicitly providing a set of attribute values D ji D i for the case of discrete features i 2 D. Since it is not possible to dene any distance measure between the possible values of discrete attributes, we cannot assign any fuzzy membership values to the elements of sets D ji. Therefore, the sets D ji are crisp sets, i.e., an element either belongs to a set or not. Taking this argument into account, equation (1) providing the membership degree of a pattern A h to a hyperbox B j, takes the 4

following form: b j (A h ) = 1 n f X i2c[1 f(a hi w ji ; ) f(v ji a hi ; )]+ X i2d m Dji (a hi )g (5) where m S (x) denotes the membership function corresponding to the crisp set S, which is equal to 1 if x 2 S, otherwise it is equal to 0. In a neural network implementation, the continuous input units are connected to the hidden units via the two kinds of weights v ji and w ji as mentioned in the previous section. In what concerns the discrete attributes, we can assign one input unit to each atribute value, that is set to 1 in case this value exists in the input pattern, while the other units corresponding to the same attribute are set equal to 0. If a specic value d ik 2 D i belongs to the set D ji, then the weight between the corresponding input unit and the hidden unit j is set equal to 1, otherwise it is 0. During training, when a pattern A h is presented to the network the expansion criterion has to be modied in order to take into account both the discrete and the continuous dimensions. More specically, we have considered two distinct expansion criteria: The rst one concerns the continuous dimensions and remains the same as in the original network given by equation (2) with n being replaced by n C which denotes the number of continuous attributes. The second expansion criterion concerns the discrete features and has the following form: X i2d m Dji (a hi ) (6) where the parameter (0 n D ) expresses the minimum number of discrete attributes in which the hyperbox B j and the pattern A h must agree in order for the hyperbox to be expanded to incorporate the pattern. During the test for expansion process, we test whether there exist expandable hyperboxes (according to the two criteria) from the same class as A h and we expand the hyperbox with the maximum membership. If no expandable hyperbox is found a new one B k is spawned and we set v ki = w ki = a hi for i 2 C and D ki = fa hi g for i 2 D. When a hyperbox is expanded, its parameters are adjusted as follows: If i 2 C v new ji = min(v old ji ; a hi) (7) w new ji = max(w old ji ; a hi) (8) 5

If i 2 D D new ji = D old ji [ fa hi g (9) During overlap test and contraction the discrete dimensions are not considered and ovelap is eliminated by adjusting ony the continuous dimensions of the hyperboxes following the minimum disturbance principle as in the original network. Although it is possible to separate two hyperboxes B j and B k by removing common elements from some of the sets D ji and D ki, we have not followed this approach. The main reason is that the disturbance in the already allocated patterns would be more signicant, since these sets do not contain many elements in general. EXPERIMENTS AND CONCLUSIONS We have studied the modied fuzzy min-max neural network classier on a dicult classication problem concerning the assignment of credit to consumer applications. The data set (obtained from the UCI repository [5]) contains 690 examples and was originally studied by Quinlan [4] using decision trees. Each example in the data set concerns an application for credit card facilities described by 9 discrete and 6 continuous attributes, with two decision classes (either accept of reject the application). Some of the discrete attributes have large collections of possible values (one of them has 14) and there exist examples in which some attribute values are missing. As noted in [4] these data are both scanty and noisy making accurate prediction on unseen cases a dicult task. Two series of experiments were performed. In the rst series, the data set was divided into a training set of 460 examples (containing equal number of positive and negative cases) that were used to adjust the network hyperboxes, while the remaining 230 examples were used as a test set to estimate the performance of the resulting classier. Each experiment in a series consisted of training the network (in a single pass) for certain values of and and then computing the percentage of correct classications over the test set. Moreover, the order of presentation of the training patterns to the network was held xed in all experiments. Best results were found for = 0:237 and = 8. For these parameter values the resulting network contained 136 hyperboxes and the success rate was 87%. It must be noted that the success rate was very sensitive both on the choice of the parameter and on the order with which the training examples are presented. This of course constitutes a weakness of the fuzzy min-max classier, but on the other hand, each training 6

experiment is very fast and the process of adjusting can be performed in reasonable time. We have also tested the classication performance in case the training data are presented to the network more than once and we have found that only marginal performance improvement is obtained. We have also used the same data set to train a multilayer perceptron using the backpropagation algorithm (the on-line version). A network with one hidden layer was considered. Several experiments were conducted for dierent values of the number of hidden units. The best classication rate we were able to obtain was 83% for a network of 10 hidden units and with learning rate 0.1. It must be noted that the required training time was excessively long compared to the one-shot training of the fuzzy min-max network. During experiments, we have observed that some of the examples were `bad', in the sense that they were very dicult predict, and, in addition, when used as part of the training set, the resulting network exhibited poorer classication performance, than in the case in which these examples were not used for training. For this reason, a second series of experiments were conducted on a subset of the data set (400 examples) that resulted from the removal of the bad examples. We considered a training set and a test set of size 200, each of them containing 100 positive and 100 negative examples. Best performance was obtained for = 0:115 and = 8 (112 hyperboxes) with classication rate 97.5%. Moreover, the performance was very robust with respect to the value of with the classication rate being more than 90% for all tested values. The best classication rate we have obtained for this data set using the backpropagation algorithm was 89.5%. As the experiments indicate, the fuzzy min-max classication neural network constitutes a promising method for pattern recognition problems that has the advantage of fast one-shot training with its only drawback coming from its sensitivity in the parameter values used in the test for expansion criteria. Therefore, further research should be focused on developing algorithms for automatically adjusting these parameters during training. REFERENCES [1] IEEE Trans. on Neural Networks, Special Issue on Fuzzy Logic and Neural networks, vol. 3, No. 5, September 1992. 7

[2] P. K. Simpson, "Fuzzy Min-Max Neural Networks-Part 1: Classi- cation," IEEE Trans. on Neural Networks, vol. 3, No. 5, pp. 776-786, September 1992. [3] P. K. Simpson, "Fuzzy Min-Max Neural Networks-Part 2: Clustering," IEEE Trans. on Fuzzy Systems, vol. 1, No. 1, pp. 32-45, February 1993. [4] J. R. Quinlan, "Simplifying Decision Trees," Int. J. Man-Machine Studies, vol. 27, pp. 221-234, 1987. [5] P. M. Murphy and D. W. Aha, "UCI Repository of Machine Learning Databases," Irvine, CA: University of California, Department of Computer Science, 1992. 8