Fuzzy Rule Selection by Data Mining Criteria and Genetic Algorithms

Similar documents
Mining from Quantitative Data with Linguistic Minimum Supports and Confidences

Introduction. Nature-Inspired Computing. Terminology. Problem Types. Constraint Satisfaction Problems - CSP. Free Optimization Problem - FOP

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5

Designing a learning system

Optimization of Multiple Input Single Output Fuzzy Membership Functions Using Clonal Selection Algorithm

3D Model Retrieval Method Based on Sample Prediction

ISSN (Print) Research Article. *Corresponding author Nengfa Hu

An Improved Shuffled Frog-Leaping Algorithm for Knapsack Problem

Ones Assignment Method for Solving Traveling Salesman Problem

Heuristic Approaches for Solving the Multidimensional Knapsack Problem (MKP)

New HSL Distance Based Colour Clustering Algorithm

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming

Performance Comparisons of PSO based Clustering

. Written in factored form it is easy to see that the roots are 2, 2, i,

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis

Euclidean Distance Based Feature Selection for Fault Detection Prediction Model in Semiconductor Manufacturing Process

Pruning and Summarizing the Discovered Time Series Association Rules from Mechanical Sensor Data Qing YANG1,a,*, Shao-Yu WANG1,b, Ting-Ting ZHANG2,c

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON

Evaluation scheme for Tracking in AMI

Image Segmentation EEE 508

Bayesian approach to reliability modelling for a probability of failure on demand parameter

Variance as a Stopping Criterion for Genetic Algorithms with Elitist Model

An Efficient Algorithm for Graph Bisection of Triangularizations

Algorithms for Disk Covering Problems with the Most Points

The golden search method: Question 1

BASED ON ITERATIVE ERROR-CORRECTION

Lecture 18. Optimization in n dimensions

An Efficient Algorithm for Graph Bisection of Triangularizations

Improvement of the Orthogonal Code Convolution Capabilities Using FPGA Implementation

Fuzzy Linear Regression Analysis

Li Zheng2 School of Management Fujian University of Technology Fujian Province, , China

A Generalized Set Theoretic Approach for Time and Space Complexity Analysis of Algorithms and Functions

Probabilistic Fuzzy Time Series Method Based on Artificial Neural Network

Cubic Polynomial Curves with a Shape Parameter

6.854J / J Advanced Algorithms Fall 2008

Pattern Recognition Systems Lab 1 Least Mean Squares

CIS 121 Data Structures and Algorithms with Java Spring Stacks, Queues, and Heaps Monday, February 18 / Tuesday, February 19

An Estimation of Distribution Algorithm for solving the Knapsack problem

Adaptive Resource Allocation for Electric Environmental Pollution through the Control Network

Data-Driven Nonlinear Hebbian Learning Method for Fuzzy Cognitive Maps

Mathematical Stat I: solutions of homework 1

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Designing a learning system

Parallel Learning of Large Fuzzy Cognitive Maps

Evaluation of Different Fitness Functions for the Evolutionary Testing of an Autonomous Parking System

On Computing the Fuzzy Weighted Average Using the KM Algorithms

Normal Distributions

Chapter 3 Classification of FFT Processor Algorithms

CIS 121 Data Structures and Algorithms with Java Spring Stacks and Queues Monday, February 12 / Tuesday, February 13

Optimal Mapped Mesh on the Circle

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Assignment Problems with fuzzy costs using Ones Assignment Method

Fuzzy Minimal Solution of Dual Fully Fuzzy Matrix Equations

New Fuzzy Color Clustering Algorithm Based on hsl Similarity

Continuous Ant Colony System and Tabu Search Algorithms Hybridized for Global Minimization of Continuous Multi-minima Functions

The isoperimetric problem on the hypercube

Polynomial Functions and Models. Learning Objectives. Polynomials. P (x) = a n x n + a n 1 x n a 1 x + a 0, a n 0

Our second algorithm. Comp 135 Machine Learning Computer Science Tufts University. Decision Trees. Decision Trees. Decision Trees.

n n B. How many subsets of C are there of cardinality n. We are selecting elements for such a

9.1. Sequences and Series. Sequences. What you should learn. Why you should learn it. Definition of Sequence

A New Bit Wise Technique for 3-Partitioning Algorithm

Big-O Analysis. Asymptotics

Hash Tables. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015.

Chapter 11. Friends, Overloaded Operators, and Arrays in Classes. Copyright 2014 Pearson Addison-Wesley. All rights reserved.

THE reconstruction of the AND-OR expression of a

Counting the Number of Minimum Roman Dominating Functions of a Graph

CSCI 5090/7090- Machine Learning. Spring Mehdi Allahyari Georgia Southern University

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method

A Study on the Performance of Cholesky-Factorization using MPI

Feature classification for multi-focus image fusion

How do we evaluate algorithms?

Lecture 17: Feature Subset Selection II

COSC 1P03. Ch 7 Recursion. Introduction to Data Structures 8.1

Criterion in selecting the clustering algorithm in Radial Basis Functional Link Nets

ENGI 4421 Probability and Statistics Faculty of Engineering and Applied Science Problem Set 1 Descriptive Statistics

Classification of binary vectors by using DSC distance to minimize stochastic complexity

DATA MINING II - 1DL460

ANN WHICH COVERS MLP AND RBF

The Impact of Feature Selection on Web Spam Detection

Solving Fuzzy Assignment Problem Using Fourier Elimination Method

Appendix D. Controller Implementation

Data diverse software fault tolerance techniques

Security of Bluetooth: An overview of Bluetooth Security

UNIT 4 Section 8 Estimating Population Parameters using Confidence Intervals

Evolutionary Hybrid Genetic-Firefly Algorithm for Global Optimization

Multiobjective Formulations of Fuzzy Rule-Based Classification System Design

Analysis of Documents Clustering Using Sampled Agglomerative Technique

HADOOP: A NEW APPROACH FOR DOCUMENT CLUSTERING

Elementary Educational Computer

Pseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance

Performance Plus Software Parameter Definitions

Lecture 5. Counting Sort / Radix Sort

A METHOD OF GENERATING RULES FOR A KERNEL FUZZY CLASSIFIER

Data Structures Week #9. Sorting

A Boolean Query Processing with a Result Cache in Mediator Systems

Journal of Chemical and Pharmaceutical Research, 2013, 5(12): Research Article

Małgorzata Sterna. Mateusz Cicheński, Mateusz Jarus, Michał Miszkiewicz, Jarosław Szymczak

Theory of Fuzzy Soft Matrix and its Multi Criteria in Decision Making Based on Three Basic t-norm Operators

Chapter 8. Strings and Vectors. Copyright 2014 Pearson Addison-Wesley. All rights reserved.

Optimization on Retrieving Containers Based on Multi-phase Hybrid Dynamic Programming

Transcription:

Fuzzy Rule Selectio by Data Miig Criteria ad Geetic Algorithms Hisao Ishibuchi Dept. of Idustrial Egieerig Osaka Prefecture Uiversity 1-1 Gakue-cho, Sakai, Osaka 599-8531, JAPAN E-mail: hisaoi@ie.osakafu-u.ac.jp Phoe: +81-72-254-9350 Takashi Yamamoto Dept. of Idustrial Egieerig Osaka Prefecture Uiversity 1-1 Gakue-cho, Sakai, Osaka, 599-8531, JAPAN E-mail: yama@ie.osakafu-u.ac.jp Phoe: +81-72-254-9351 Abstract This paper shows how a small umber of fuzzy rules ca be selected for desigig iterpretable fuzzy rule-based classificatio systems. Our approach cosists of two phases: cadidate rule geeratio by data miig criteria ad rule selectio by geetic algorithms. First a large umber of cadidate rules are geerated ad prescreeed usig two rule evaluatio criteria i data miig. Next a small umber of fuzzy rules are selected from cadidate rules usig geetic algorithms. Rule selectio is formulated as a optimizatio problem with three objectives: to maximize the classificatio accuracy, to miimize the umber of selected rules, ad to miimize the total rule legth. Thus the task of geetic algorithms is to fid o-domiated rule sets with respect to the three objectives. 1. INTRODUCTION Fuzzy rule-based systems have bee successfully applied to various fields such as cotrol, modelig, ad classificatio (Leodes 1999). While the mai goal i the desig of fuzzy rule-based systems has bee the performace maximizatio, their iterpretability has also bee take ito accout i some recet studies (Pee- Reyes & Sipper 1999, Castillo et al. 2001, Roubos & Setes 2001, ad Casillas et al. 2002). I this paper, we cosider three objectives i the desig of fuzzy rule-based classificatio systems as i Ishibuchi, Nakashima & Murata (2001): Classificatio accuracy, the umber of fuzzy rules, ad the total legth of fuzzy rules. The legth of a fuzzy rule is the umber of its atecedet coditios (i.e., the umber of attributes i its atecedet part). The first objective is the performace maximizatio while the others are related to the iterpretability. Usually huma users do ot wat to maually check hudreds of fuzzy rules. Thus the umber of fuzzy rules is closely related to the iterpretability of fuzzy rule-based systems. Fuzzy rule-based systems with a small umber of fuzzy rules are ot always iterpretable. Huma users caot ituitively uderstad log fuzzy rules with may atecedet coditios. Thus the rule legth is also closely related to the iterpretability of fuzzy rule-based systems. I this paper, we maximize the classificatio accuracy of fuzzy rule-based systems, miimize the umber of fuzzy rules, ad miimize the total legth of fuzzy rules. Multiobjective geetic algorithms are used for fidig odomiated rule sets with respect to these three objectives. Fuzzy rule geeratio methods ca be categorized ito two approaches accordig to their strategies for dividig the iput space ito fuzzy subspaces. Oe approach is based o grid-type fuzzy partitios where the domai iterval of each iput is divided ito atecedet fuzzy sets with liguistic labels. Fig. 1 is a example of such a gridtype fuzzy partitio. The other approach uses multidimesioal atecedet fuzzy sets defied o the iput space. Fig. 2 illustrates two-dimesioal ellipsoidal atecedet fuzzy sets. Multi-dimesioal atecedet fuzzy sets usually lead to fuzzy rule-based systems with high accuracy but low iterpretability. O the other had, fuzzy rule-based systems with high iterpretability ca be geerated from grid-type fuzzy partitios. Sice our goal is to geerate iterpretable fuzzy rule-based systems, we use the first approach (i.e., grid-type fuzzy partitios). As discussed i Suzuki & Furuhashi (2001), homogeeous fuzzy partitios are more iterpretable tha adjusted oes. Thus we use homogeeous fuzzy partitios as show i Fig. 1. Usually we do ot kow a appropriate fuzzy partitio for each iput. I geeral, each iput may have a differet fuzzy partitio while the two axes of the iput space is divided by the same fuzzy partitio i Fig. 1. Moreover, geeral rules may use coarse fuzzy partitios while specific rules may use fie fuzzy partitios i a sigle fuzzy rule-based system. For hadlig such a situatio with differet fuzzy partitios of differet graularities, we specify each atecedet coditio of

fuzzy rules by choosig a atecedet fuzzy set from various fuzzy partitios for each iput. I this paper, we use four fuzzy partitios i Fig. 3 where the total umber of atecedet fuzzy sets is 14. For geeratig short fuzzy rules with a small umber of atecedet coditios, we use do t care as a additioal atecedet fuzzy set. Thus a atecedet fuzzy set for each iput is chose from the 14 fuzzy sets i Fig. 3 ad do t care. The total umber of combiatios of atecedet fuzzy sets is 15 for a -dimesioal patter classificatio problem. Figure 1: A 5 S S MS M ML L x 1 L ML M MS S x 2 5 fuzzy grid of a two-dimesioal iput space. x 2 x 1 Figure 2: Ellipsoidal atecedet fuzzy sets. L S 4 MS 4 ML 4 L 4 S 3 M 3 L 3 S 5 MS 5 M 5 ML 5 L 5 Figure 3: Four fuzzy partitios. The meaig of each label is as follows: S: small, MS: medium small, M: medium, ML: medium large, ad L: large. The superscript of each label deotes the graularity of the correspodig fuzzy partitio. Geetic algorithm-based fuzzy rule selectio (Ishibuchi, Nakashima & Murata, 2001) cosists of two phases. I the first phase, a large umber of cadidate rules are geerated from various combiatios of atecedet fuzzy sets. I the secod phase, subsets of the geerated cadidate rules are examied usig geetic algorithms for fidig o-domiated rule sets with respect to the abovemetioed three objectives. I Ishibuchi, Nakashima & Murata (2001), a sigle fuzzy partitio was used for all iputs as i Fig. 1. I this case, the total umber of combiatios of atecedet fuzzy sets icludig do t care is ( 5 + 1) for a -dimesioal patter classificatio problem. This is much smaller tha 15 i this paper. That is, we have much more cadidate rules. It should be oted that the search space for fidig odomiated rule sets expoetially expads as the umber of cadidate rules icreases. The efficiecy of geetic algorithms is sigificatly deteriorated by the icrease i the umber of cadidate rules as show i this paper. Thus we eed a trick for decreasig the umber of cadidate rules. Our idea is to prescree cadidate rules based o fuzzy versios of two rule evaluatio criteria (i.e., cofidece ad support) for associatio rules, which have bee freuetly used i the field of data miig (Agrawal et al. 1996). I our prescreeig procedure, fuzzy rules are divided ito several groups accordig to their coseuet classes. The fuzzy rules i each group are sorted i a descedig order of the product of cofidece ad support. Fially a pre-specified umber of fuzzy rules are chose from the top of the rule list for each group. The selected fuzzy rules are used as cadidate rules i our geetic algorithm-based rule selectio method. I the ext sectio, we show how the desig of fuzzy rulebased classificatio systems ca be formulated as a threeobjective rule selectio problem. I Sectio 3, we propose a prescreeig procedure of cadidate rules usig fuzzy versios of the two rule evaluatio criteria i data miig. I Sectio 4, we describe a three-objective geetic algorithm for rule selectio. The effect of the proposed prescreeig procedure o the efficiecy of the geetic algorithm-based rule selectio method is examied i Sectio 5 through computer simulatios. Fially Sectio 6 cocludes this paper. 2. PROBLEM FORMULATION Let us cosider a M-class patter classificatio problem with m labeled patters x p = ( x p1,..., x p ), p = 1, 2,...,m i a -dimesioal cotiuous patter space. For simplicity of explaatio, we assume that the patter space is the -dimesioal uit hypercube [ 0, 1]. That is, we assume that all attribute values are real umbers i the uit iterval [ 0, 1]. For our patter classificatio problem, we use fuzzy rules of the followig form: Rule R : If x 1 is A 1 ad... ad x is the Class A C with CF, (1)

where R is the -th fuzzy rule, x = ( x1,..., x ) is a -dimesioal patter vector, A i is a atecedet fuzzy set, C is a coseuet class (i.e., oe of the M classes), ad CF is a rule weight (i.e., certaity factor). The atecedet fuzzy set A i is oe of the 14 fuzzy sets i Fig. 3 or do t care. The rule weight CF is a real umber i the uit iterval [ 0, 1]. As show i the ext sectio, the coseuet class C ad the rule weight CF are determied i a heuristic maer from compatible traiig patters with the atecedet part of R. Let S be a subset of 15 fuzzy rules of the form (1). Our task is to fid rule sets with high classificatio ability ad high iterpretability. This task ca be rephrased as fidig a small umber of simple fuzzy rules with high classificatio ability. As i Ishibuchi, Nakashima & Murata (2001), our rule selectio problem is formulated as the followig three-objective optimizatio problem: Maximize f 1 ( S ), ad miimize f 2 ( S ), f 3 ( S ), (2) where f 1 ( S ) is the umber of correctly classified traiig patters by S, f 2 ( S ) is the umber of fuzzy rules i S, ad f 3 ( S ) is the total rule legth of fuzzy rules i S. Usually there is o optimal rule set with respect to all the three objectives. Thus our task is to fid multiple rule sets that are ot domiated by ay other rule sets. A rule set S B is said to domiate aother rule set S A (i.e., S B is better tha S A : S A S B ) if all the followig ieualities hold: f 1 ( S A ) f 1 ( S B ), (3) f 2 ( S A ) f 2 ( S B ), (4) f 3 ( S A ) f 3 ( S B ), (5) ad at least oe of the followig ieualities holds: f 1 ( S A ) < f 1 ( S B ), (6) f 2 ( S A ) > f 2 ( S B ), (7) f S ) > f ( S ). (8) 3 ( A 3 B The first coditio (i.e., all the three ieualities i (3)- (5)) meas that o objective of S B is worse tha S A (i.e., S B is ot worse tha S A ). The secod coditio (i.e., oe of the three ieualities i (6)-(8)) meas that at least oe objective of S B is better tha S A. Whe a rule set S is ot domiated by ay other rule sets, S is said to be a Pareto-optimal solutio of our rule selectio problem i (2). I may cases, it is impractical to try to fid true Pareto-optimal solutios of our rule selectio problem whose search space is huge (i.e., the search space is the power set of 15 fuzzy rules). Thus we try to fid ear Pareto-optimal solutios. More specifically, first we decrease the search space by prescreeig cadidate fuzzy rules. The we search for ear Pareto-optimal solutios by a three-objective geetic algorithm. 3. CANDIDATE RULE PRESCREENING 3.1 FUZZIFICATION OF ASSOCIATION RULES As we have already explaied, the total umber of combiatios of atecedet fuzzy sets is 15 for our - dimesioal patter classificatio problem. Whe is small (e.g., 4 ), we ca examie all combiatios of atecedet fuzzy sets for geeratig fuzzy rules ad use all the geerated fuzzy rules as cadidate rules i our geetic algorithm-based rule selectio method. That is, o prescreeig of cadidate rules is ecessary. O the other had, we eed a prescreeig procedure whe is large. It is time-cosumig to examie all the 15 combiatios whe is large (e.g., = 13 i wie data used i computer simulatios of this paper). I this case, it is also impractical to use all the geerated fuzzy rules as cadidate rules i our geetic algorithm-based rule selectio method. Our idea is to use rule evaluatio criteria i data miig for decreasig the umber of cadidate rules. I the area of data miig, two criteria called cofidece ad support have ofte bee used for evaluatig associatio rules (Agrawal et al. 1996). Our fuzzy rule i (1) ca be viewed as a associatio rule of the form A C. We use the two criteria for prescreeig cadidate rules. I this subsectio, we show how the defiitios of these two criteria ca be exteded to the case of the fuzzy associatio rule A C (Ishibuchi, Yamamoto & Nakashima, 2001). Similar extesios of the two criteria to fuzzy associatio rules were also proposed i Hog et al. (2001). Let D be the set of the give m traiig patters x p = ( x p1,..., x p ), p = 1, 2,..., m. The cardiality of D is m (i.e., D = m ). The cofidece of A C is defied as follows (Agrawal et al. 1996): c ( A D ( A ) D ( C C ) =, (9) D ( A where the deomiator D ( A is the umber of traiig patters compatible with the atecedet part A, ad the umerator D ( A ) D ( C is the umber of traiig patters compatible with both the atecedet part A ad the coseuet class C. The cofidece c idicates the grade of the validity of A C. That is, c ( 100%) of traiig patters compatible with A are also compatible with C. I the case of stadard associatio rules, either A or C is fuzzy. Thus the calculatios of D ( A ad D ( A ) D ( C ca be performed by simply coutig compatible traiig patters. O the other had, each traiig patter has a differet compatibility grade µ A ( x p ) with the atecedet part A whe A C is a fuzzy associatio rule. Thus such a compatibility grade should

be take ito accout. Sice the coseuet class C is ot fuzzy, the cofidece i (9) ca be rewritte as follows (Ishibuchi, Yamamoto & Nakashima 2001): c ( A C ) = = D ( A p= 1 ) D ( A p Class C m µ µ D ( C A ( x p ) A ( x p ). (10) The compatibility grade µ A ( x p ) is usually defied by the product or miimum operator. I this paper, we use the product operator as µ A ( x p ) = µ A ( x 1 ) ( ) 1 p µ A x p, (11) where µ A ( x pi ) is the membership fuctio of the i atecedet fuzzy set A i (i.e., each triagle i Fig. 3). O the other had, the support of A C is defied as follows (Agrawal et al. 1996): D ( A ) D ( C s ( A C ) =. (12) D The support s idicates the grade of the coverage by A C. That is, s ( 100%) of all the traiig patters are compatible with the associatio rule A C (i.e., compatible with both A ad C ). I the same maer as the cofidece i (10), the support i (12) ca be rewritte as follows (Ishibuchi, Yamamoto & Nakashima 2001): s ( A C ) = = D ( A ) D D ( C µ A p Class C m ( x p ) 3.2 CONSEQUENT CLASS AND RULE WEIGHT. (13) The coseuet class C of the fuzzy rule R with the atecedet part A is determied as c ( A C ) = max{ c ( A Class 1),..., c ( A Class M )}. (14) That is, the coseuet class has the maximum cofidece amog the M alterative classes. It should be oted that the same class C is obtaied for A whe we use the support s istead of the cofidece c. This is because the followig relatio holds betwee the cofidece c ad the support s from their defiitios: D ( A s ( A Class h ) = c ( A Class h ), D t = 1, 2,...,M. (15) Sice the secod term (i.e., D ( A / D ) of the righthad side is idepedet of the coseuet class, the class with the maximum cofidece is the same as the class with the maximum support. The same class also has the maximum product of these two criteria. Usually we ca uiuely specify the coseuet class C for each combiatio A of atecedet fuzzy sets. Oly whe multiple classes have the same maximum cofidece (icludig the case of o compatible traiig patter with the atecedet part A : c ( A Class h ) = 0 for all classes), we caot specify the coseuet class C for A. I this case, we do ot geerate the correspodig fuzzy rule R. The cofidece of R ca be directly used as its rule weight as i Cordo et al. (1999). Our prelimiary simulatio results showed that better results were obtaied from the followig defiitio of the rule weight tha the direct use of the cofidece: CF = c ( A C ) csecod, (16) where c Secod is the secod largest cofidece for the atecedet part A : csecod = max{ c ( A Class h h C }. (17) h Our prelimiary computer simulatios also showed that better results were obtaied from the defiitio i (16) tha the followig defiitio used i some studies o fuzzy rule-based classificatio systems (e.g., Ishibuchi, Yamamoto & Nakashima 2001): CF = c ( A C ) c Average, (18) where c Average is the average cofidece over fuzzy rules with the same atecedet part A but differet coseuet classes: 1 c Average = c( A Class h). (19) M 1 h C 3.3 PRESCREENING PROCEDURE The geerated fuzzy rules are divided ito M groups accordig to their coseuet classes. Fuzzy rules i each group are sorted i a descedig order of the product of the cofidece ad the support (i.e., s c ). For selectig N cadidate rules, the first N / M rules are chose from each of the M groups. I this maer, we ca choose a

pre-specified umber of cadidate rules as cadidate rules i our geetic algorithm-based rule selectio method. I our prelimiary computer simulatios, we also examied the cofidece ad the support as rule prescreeig criteria. The best result amog the three criteria for rule prescreeig (i.e., cofidece, support, ad their product) was obtaied whe we used the product of the cofidece ad the support. As we have already metioed, the total umber of combiatios of atecedet fuzzy sets is 15 for our - dimesioal patter classificatio problem. Thus it is impractical to examie all combiatios whe is large. I this case, we examie oly short fuzzy rules with oly a few atecedet coditios (i.e., with may do t care coditios). The umber of fuzzy rules of the legth L is L calculated as C L 14 because we have 14 atecedet fuzzy sets for each iput (excludig do t care). Eve L whe is large, C L 14 is ot so large for a small value of L. This meas that the umber of short fuzzy rules is ot so large eve whe the total umber of fuzzy rules is huge. 4. GENETIC ALGORITHM May geetic algorithms for multi-objective optimizatio problems have bee proposed i the literature (Zitzler & Thiele 1999, ad Zitzler et al. 2000). Sice each rule set ca be represeted by a biary strig, we ca apply those algorithms to our three-objective rule selectio problem i Sectio 2. I this paper, we use a slightly modified versio of a three-objective geetic algorithm for rule selectio i Ishibuchi, Nakashima & Murata (2001). This algorithm has two characteristic features. Oe is to use a scalar fitess fuctio with variable radom weights for evaluatig each strig (i.e., each rule set). Wheever a pair of paret solutios is selected for crossover, weights are radomly updated. That is, each selectio is govered by a differet weight vector. Geetic search i various directios i the three-dimesioal objective space is realized by this radom weightig scheme. The other characteristic feature is to store all o-domiated solutios as a secodary populatio separately from a curret populatio. The secodary populatio is updated at every geeratio. A small umber of o-domiated solutios are radomly chose from the secodary populatio ad their copies are added to the curret populatio as elite solutios. The covergece speed of the curret populatio to Pareto-optimal solutios is improved by the elitist strategy. Other parts of our threeobjective geetic algorithm are the same as stadard sigle-objective geetic algorithms. Note that our task is to fid multiple o-domiated solutios while the task of stadard geetic algorithms is to fid a sigle optimal solutio. Of course, we ca use other multi-objective geetic algorithms proposed i the literature. A arbitrary subset S of N cadidate fuzzy rules ca be represeted by a biary strig of the legth N as S = s1 s 2 s N, (20) where s = 0 meas that the -th rule R is ot icluded i S while s = 1 meas that R is icluded i S. A iitial populatio is costructed by radomly geeratig a pre-specified umber of biary strigs of the legth N. The first objective f 1 ( S ) of each strig S is calculated by classifyig all the give traiig patters by S. We use a fuzzy reasoig method based o a sigle wier rule as i Ishibuchi, Nakashima & Murata (2001). I this fuzzy reasoig method, the classificatio of each patter by the rule set S is performed by fidig a sigle wier rule with the maximum product of the rule weight ad the compatibility grade with that patter. There are may cases where some fuzzy rules i S are ot chose as wier rules for ay patters. We ca remove those fuzzy rules from S without degradig the classificatio accuracy of S. At the same time, the secod ad third objectives are improved by removig uecessary fuzzy rules. Thus we remove all fuzzy rules that are ot selected as wier rules of ay patters from the rule set S. The removal of those rules is performed for each strig i the curret populatio by chagig the correspodig 1 s to 0 s before the secod ad third objectives are calculated. After the three objectives of each strig (i.e., each rule set) i the curret populatio are calculated, the secodary populatio of o-domiated rule sets is updated. That is, each rule set i the curret populatio is examied whether it is domiated by other rule sets i the curret ad secodary populatios. If it is ot domiated by ay other rule sets, its copy is added to the secodary populatio. The all rule sets domiated by the ewly added oe are removed from the secodary populatio. I this maer, the secodary populatio is updated at every geeratio. The fitess value of each rule set S i the curret populatio is defied by the three objectives as fitess ( S ) = w1 f 1 ( S ) w 2 f 2 ( S ) w 3 f 3 ( S ), (21) where w 1, w 2 ad w 3 are weights satisfyig the followig coditios: w 1, w2, w3 0, (22) w 1 + w2 + w3 = 1. (23) Wheever a pair of paret strigs is selected from the curret populatio, these weights are radomly updated. The radom specificatio of the rule weights is to search for a variety of o-domiated rule sets i the threedimesioal objective space. Biary touramet selectio with replacemet is used for selectig a pair of paret

strigs usig the scalar fitess fuctio i (21) with the radomly specified weights. That is, two strigs are radomly selected from the curret populatio ad the better oe is chose as a paret strig. The the two strigs are retured to the curret populatio. The other paret strig is also selected i the same maer usig the same weight values. Whe aother pair of paret strigs is selected, the weight values are radomly updated. Uiform crossover is applied to each pair of paret strigs to geerate a ew strig. The biased mutatio is applied to the geerated strig for efficietly decreasig the umber of fuzzy rules icluded i the strig. I the biased mutatio operatio, a larger probability is assiged to the mutatio from 1 to 0 (i.e., mutatio for decreasig the umber of fuzzy rules) tha the mutatio from 0 to 1 (i.e., mutatio for icreasig the umber of fuzzy rules). The ext populatio cosists of the ewly geerated strigs by the geetic operatios. Some o-domiated strigs i the secodary populatio are radomly selected as elite solutios ad their copies are added to the ew populatio. The outlie of the three-objective geetic algorithm for rule selectio is writte as follows: Step 0: Parameter Specificatio. Specify the populatio size N pop, the umber of elite solutios N elite that are radomly selected from the secodary populatio ad added to the curret populatio, the crossover probability p c, two mutatio probabilities p m ( 1 0) ad p m ( 0 1), ad the stoppig coditio. Step 1: Iitializatio. Radomly geerate N pop biary strigs of the legth N as a iitial populatio. Calculate the three objectives of each strig. I this calculatio, uecessary rules are removed from each strig. Fid o-domiated strigs (i.e., o-domiated rule sets) i the iitial populatio. A secodary populatio cosists of copies of those odomiated strigs. Step 2: Geetic Operatios. Geerate ( N pop N elite ) strigs usig geetic operatios (i.e., biary touramet selectio, uiform crossover, ad biased mutatio) from the curret populatio. Step 3: Evaluatio. Calculate the three objectives of each of the ewly geerated ( N pop N elite ) strigs. I this calculatio, uecessary rules are removed from each strig. The curret populatio cosists of the modified strigs. Step 4: Secodary Populatio Update. Update the secodary populatio by examiig each strig i the curret populatio as metioed above. Step 5: Elitist Strategy. Radomly select N elite strigs from the secodary populatio ad add their copies to the curret populatio. Step 6: Termiatio Test. If the stoppig coditio is ot satisfied, retur to Step 2. Otherwise termiate the executio of the algorithm. All the o-domiated strigs amog examied oes i the executio of the algorithm are stored i the secodary populatio. 5. COMPUTER SIMULATIONS We apply the proposed rule selectio method to wie data available from the UCI Machie Learig Repository (http://www.ics.uci.edu/~mlear/mlsummary.html). The wie data set cosists of 178 samples with 13 cotiuous attributes from three classes. We ormalized each attribute value ito a real umber i the uit iterval [0, 1]. Thus the wie data set was hadled as a three-class patter classificatio problem i the 13-dimesioal uit 13 hypercube [ 0, 1]. The total umber of possible 13 combiatios of atecedet fuzzy sets is 15. First we geerated fuzzy rules of the legth three or less usig all the 178 samples as traiig patters. The umber of geerated fuzzy rules of each legth is summarized i Table 1. The fuzzy rule of the legth zero has o atecedet coditios, Class 2 coseuet, ad a very small certaity grade (i.e., rule weight). This fuzzy rule ca be geerated because the umber of Class 2 samples is the largest amog the three classes i the wie data. Table 1: The umber of geerated fuzzy rules of each legth. Legth of rules 0 1 2 3 Total Number of rules 1 182 14,781 696,752 711,716 The geerated 711,716 fuzzy rules were divided ito three groups accordig to their coseuet classes. Fuzzy rules i each class were sorted i a descedig order of the product of the cofidece ad the support. From each group, the first 300 fuzzy rules were selected as cadidate rules ( N = 900 : 900 cadidate rules i total). The the three-objective geetic algorithm was applied to the 900 cadidate rules usig the followig parameter specificatios. Populatio size: N pop = 50, Number of elite solutios: N elite = 5, Crossover probability: p c = 0. 9, Mutatio probability: p m ( 1 0) = 0. 1, p m ( 0 1) = 1/ N, Stoppig coditio: 10,000 geeratios. Our computer simulatios were iterated 20 times. Nodomiated rule sets obtaied from those 20 trials are summarized i Table 2. Examples of the obtaied rule

sets i Table 2 are show i Fig. 4 ad Fig. 5. Fig. 4 shows three fuzzy rules with oly a sigle atecedet coditio, which correspod to the secod rule set with a 94.9% classificatio rate i Table 2. Fig. 5 shows three fuzzy rules with a few atecedet coditios, which correspod to the sixth rule set with a 100% classificatio rate i Table 2. Table 2: No-domiated rule sets obtaied from 20 trials of the proposed method with 900 cadidate rules. Number of rules Average rule legth 3 0.67 88.2 3 0 94.9 3 1.33 96.1 3 1.67 98.3 3 2.00 99.4 3 2.33 10 4 0.75 96.1 4 0 97.2 4 1.25 98.9 Classificatio rate (%) selected 900 cadidate rules from the geerated 711,716 fuzzy rules. Simulatio results are summarized i Table 3. From the compariso betwee Table 2 ad Table 3, we ca see that the classificatio ability ad/or the iterpretability of obtaied rule sets were deteriorated by the use of radomly selected cadidate rules. We also performed the same computer simulatio usig o prescreeig procedure. I this case, all the geerated 711,716 fuzzy rules were used as cadidate rules. Thus the strig legth was 711,716. As we ca expect, the executio of the three-objective geetic algorithm with such a log strig reuired large memory storage ad log CPU time. Table 4 shows o-domiated rule sets obtaied from te trials of the three-objective geetic algorithm. Sice the search space was too large, good rule sets could ot be obtaied withi a reasoable computatio time (especially with respect to the umber of fuzzy rules as show i Table 4). The average CPU time for each trial was about 11 hours i Table 4 while it was about four miutes i Table 2 with 900 cadidate rules selected by the proposed prescreeig procedure. R 1 R 2 R 3 x 1 x7 x13 Coseuet Class 1 (0.39) Class 2 (0.31) Class 3 (0.29) Figure 4: Three fuzzy rules with a 94.9% classificatio rate. R 1 R 2 R 3 x 1 x 5 x7 x10 x11 x13 Coseuet Class 1 (0.25) Class 2 (0.77) Class 3 (0.89) Figure 5: Three fuzzy rules with a 100% classificatio rate. From Table 2, we ca see that our rule selectio method foud various rule sets with differet classificatio rates ad differet sizes. The selected rule sets have high iterpretability as show i Fig. 4 ad Fig. 5. From the compariso betwee Fig. 4 ad Fig. 5, we ca observe the existece of a tradeoff betwee classificatio accuracy ad iterpretability (i.e., the three fuzzy rules i Fig. 5 have a higher classificatio rate but less iterpretable). For examiig the usefuless of the proposed prescreeig procedure of cadidate rules, the same computer simulatio was performed usig radomly Table 3: Simulatio results with radomly selected 900 cadidate rules. Number of rules Average rule legth 3 1.67 86.5 3 2.00 93.3 3 2.33 95.5 3 2.67 96.1 4 2.25 96.6 4 2.50 97.2 4 2.75 97.8 5 2.40 98.3 5 2.60 98.9 6 2.50 99.4 7 2.57 10 8 2.13 10 Classificatio rate (%) Table 4: Simulatio results with 711,716 cadidate rules. Number of rules Average rule legth 5 1.40 94.4 5 1.60 96.1 6 1.50 96.6 6 1.83 98.3 7 1.71 10 Classificatio rate (%) Fially we examied the effect of usig various fuzzy partitios for each iput o the classificatio performace of fuzzy rule-based classificatio systems. I the same maer as the computer simulatio for Table 2, we applied our rule selectio method to the wie data set usig oly the fiest fuzzy partitio with five liguistic labels i Fig. 3 (i.e., the bottom-right fuzzy partitio i

Fig. 3). Table 5 shows o-domiated rule sets obtaied from 20 trials. From the compariso betwee Table 2 ad Table 5, we ca see that smaller rule sets with higher classificatio rates were obtaied i Table 2 tha Table 5. This result was expected from the fact that the three fuzzy rules with a 100% classificatio rate i Fig. 5 use various fuzzy partitios with differet graularities. Table 5: No-domiated rule sets obtaied from 20 trials usig oly a sigle fuzzy partitio with five fuzzy sets. Number of rules 6. CONCLUSIONS Average rule legth 3 0.67 85.4 3 0 91.6 3 1.33 93.3 4 0 95.5 4 1.25 96.1 4 1.50 97.2 5 0 97.2 5 1.40 97.8 5 1.60 98.3 5 1.80 98.9 6 0 97.8 6 1.17 98.3 6 1.33 98.9 6 1.50 99.4 7 1.57 10 Classificatio rate (%) I this paper, we exteded the geetic algorithm-based rule selectio method i Ishibuchi, Nakashima & Murata (2001) to the case where various fuzzy partitios with differet graularities are used for each iput. This extesio leads to the icrease i the umber of cadidate rules. Thus we proposed a prescreeig procedure for decreasig the umber of cadidate rules. The proposed prescreeig procedure is based o two rule evaluatio criteria of associatio rules i the field of data miig. Through computer simulatios, we demostrated the ecessity of cadidate rule prescreeig i geetic algorithm-based rule selectio. The three-objective geetic algorithm could ot fid good rule sets whe cadidate rules were radomly chose. I the case of o prescreeig, the CPU time was very log (i.e., about 11 hours) while it was a few miutes i the case with the proposed prescreeig procedure. REFERENCES R. Agrawal et al. (1996) Fast discovery of associatio rules, i U. M. Fayyad et al. (eds.) Advaces i Kowledge Discovery & Data Miig, 307-328, AAAI Press, Melo Park. J. Casillas, O. Cordó, F. Herrera, ad L. Magdalea (2002) Trade-off betwee Accuracy ad Iterpretability i Fuzzy Rule-Based Modelig, Physica-Verlag. L. Castillo, A. Gozalez, ad P. Perez (2001) Icludig a simplicity criterio i the selectio of the best rule i a geetic fuzzy learig algorithm, Fuzzy Sets ad Systems 120, 309-321. O. Cordo, M. J. del Jesus, ad F. Herrera (1999) A proposal o reasoig methods i fuzzy rule-based classificatio systems, Iteratioal Joural of Approximate Reasoig 20, 21-45. U. M. Fayyad ad K. B. Irai (1993) Multi-iterval discretizatio of cotiuous-valued attributes for classificatio learig, Proc. of 13th Iteratioal Joit Coferece o Artificial Itelligece, 1022-1027. T. -P. Hog, C. -S. Kuo, ad S. -C. Chi (2001) Trade-off betwee computatio time ad umber of rules for fuzzy miig from uatitative data, Iteratioal Joural of Ucertaity, Fuzziess ad Kowledge-Based Systems 9, 587-604. H. Ishibuchi, T. Nakashima, ad T. Murata (2001) Three-objective geetics-based machie learig for liguistic rule extractio, Iformatio Scieces 136, 109-133. H. Ishibuchi, T. Yamamoto, ad T. Nakashima (2001) Fuzzy data miig: Effect of fuzzy discretizatio, Proc. of 1st IEEE Iteratioal Coferece o Data Miig, 241-248. C. T. Leodes (1999) Fuzzy Theory Systems: Techiues ad Applicatios (Vols. 1-4), Academic Press, Sa Diego. C. A. Pee-Reyes ad M. Sipper (1999) Desigig brest cacer diagostic systems via a hybrid fuzzy-geetic methodology, Proc. of IEEE Iteratioal Coferece o Fuzzy Systems 1, 135-139. J. R. Quila (1993) C4.5: Programs for Machie Learig, Morga Kaufma, Sa Mateo. H. Roubos ad M Setes (2001) Compact ad trasparet fuzzy models ad classifiers through iterative complexity reductio, IEEE Tras. o Fuzzy Systems 9, 516-524. T. Suzuki ad T. Furuhashi (2001) Evolutioary algorithm based fuzzy modelig usig cociseess measure, Proc. of Joit IFSA-NAFIPS Iteratioal Coferece, 1575-1580. E. Zitzler, K. Deb, ad L. Thiele (2000) Compariso of Multiobjective Evolutioary Algorithms: Empirical Results, Evolutioary Computatio 8, 173-195. E. Zitzler ad L. Thiele (1999) Multiobjective evolutioary algorithms: A comparative case study ad the stregth Pareto approach, IEEE Tras. o Evolutioary Computatio 3, 257-271.