Matrix Inference in Fuzzy Decision Trees Santiago Aja-Fernández LPI, ETSIT Telecomunicación University of Valladolid, Spain sanaja@tel.uva.es Carlos Alberola-López LPI, ETSIT Telecomunicación University of Valladolid, Spain caralb@tel.uva.es Abstract A matrix method for fuzzy systems (FITM) is used to perform inferences in fuzzy decision trees (FDT). The method is applied once the tree is designed and built. Using transition matrices the output calculation is faster and some undesired weighted effects of the FDT can be avoided. Keywords: FITM, matrix inference, fuzzy decision trees. 1 Introduction Decision trees have proved to be a simple and robust method to divide the space in attributes and to make decisions based on symbolic inputs. They are by nature readily interpretable and wellsuited to classification problems [1]. A decision tree consists of nodes for testing attributes, edges for branching by values of symbols and leaves for deciding class names to be classified. Different methods have been proposed in order to create the space partitioning that will generate the tree. CART and ID3 [2] are two important algorithms to perform this task. The main ideas behind both of them coincide: partitioning the sample space in a way that depends on the data, and then representing it as a tree. Their aim is to minimize the size of the tree while they optimize some quality criterion. CART does not require an a priori partitioning. It is based on dynamically computed thresholds for continuous domains. ID3 assumes small cardinality domains and requires a priori partitioning. Umano et al [3] proposed a fuzzy extension of ID3, the fuzzy ID3 algorithm. It is to be used on a set of fuzzy data. It generates a fuzzy decision tree using the fuzzy sets defined a priori by the user. Many further studies on fuzzy trees have been reported [1, 2, 4]. In this paper we will focuse on the fuzzy inference performed once the tree is generated. As a starting point we will suppose we have a fuzzy decision tree (FDT) which has been created using some well-known technique. Our purpose is not to modify or to improve any existing algorithm, but to improve the inference method over a welldefined tree. The examples presented have been done using the fuzzy ID3 algorithm. To perform the inference we will use a recently proposed methodology based in transition matrices, known as FITM (Fast Inference using Transition Matrices) [5]. FITM is a procedure initially intended for computing with words (CWW) applications [6], but it may be of interest in other fields, such as control, image processing or hierarchical fuzzy systems (HFSs) modeling [7]. FITM methodology has been proposed to perform inferences in SAM (Standard Additive Model [8]) fuzzy systems (FSs) efficiently; it is based on representing each input to the FS as a vector, the coordinates of which are the contribution to the input of each of the elements of the input linguistic variable (LV). The authors have demonstrated that, with this assumption, a great deal of the operations that SAMs have to carry out can be precomputed and stored as transition matrices, so only a few operations have to be performed on-line, leading to a considerable reduction of the overall computational complexity of the inference 979
process. In FITM environments, the inputs to the FSs originally have to satisfy a property; specifically inputs are required to be linear combinations of the fuzzy sets that the input LV consists of. This requirement typically holds in CWW applications. In FDT, when the features of the samples are expressed in some descriptive language, the requirement also holds, so it should be possible to rebuild a FDT using the FITM methodology and to benefit from its associated computational savings. This paper is structured as follows. In section 2 a review of FITM procedure is carried out. In section 3 the use of the FITM methodology in FDT is introduced. Two methods of inference are proposed. In section 4 implicit and explicit rule bases are discussed. 2 FITM Background This methodology was originally proposed in [5] to perform inferences efficiently in CWW environments using SAM-FS. It is totally equivalent to the SAM inference (in terms of the output centroid) with a considerable reduction of the overall computational complexity in the inference process 1. In FITM environments each input is represented as a vector in the input space (as it was shown in [5, 10]). Its coordinates are the contribution of each fuzzy set of the input linguistic variable to the input set. The relation between sets and the rule base is coded in a small amount of matrices. The key of the method is the possibility to precompute a great deal of operations, so only a small fraction of the overall complexity of the SAM system has to be performed on line. In addition, storage needs are moderate, since only the transition matrices defined for each FS are needed to perform the inference. The intermediate data structures needed to obtain the final matrices can be discarded once these transition matrices have been calculated. 1 Although FITM procedure was originally proposed to be used in SAM FSs, a natural extension to non-linear FS has been carried out in [9]. In the following subsection, the method to build a FITM inference engine for a 2 input single output (2-ISO) FS is presented. For a MISO system see [5]. fuzzy inputs FS fuzzy output Figure 1: Input/output distribution in a FITM 2.1 Construction of matrices: 2-ISO Case Assume a 2-input single output fuzzy system as the one in Fig. 1, with inputs X and Y and output Z. The inputs and the output are all fuzzy sets defined on their respective LVs 2. First input LV consists of M possible fuzzy sets A k defined on the universe U R; the second input Y, a LV consisting of N possible fuzzy sets B l defined on the universe V R and the output LV consists of L possible fuzzy sets D n defined on the universe W R. Provided that the inputs can be expressed in vector form [5, 10] : X = Y = then the output is M β k A k = β T A k=1 N l=1 α l B l = α T B L Z = γ n D n = γ T D n=1 The whole SAM inference process can be rewritten using transition matrices as ( N ) γ = α l Ω l β (1) l=1 with Ω l the transition matrix of the system for input Y = B l. In order to build these matrices 2 When the input to a FS is a crisp value x, the activation of each fuzzy set A is µ A(x). When the input is a fuzzy set X (as opposed to a crisp value), the activation is now µ A(X) = A X, or equivalently, µ A(x) µ X(x), with a properly defined activation operator [11, 12]. 980
we must define some intermediate data structures that can be discarded once Ω l are calculated. First of all, we must create the activation matrix of each input; for input X it is defined 3 : R A = A 1 A 1 A 1 A M.. A M A 1 A M A M = [A 1... A M ] T [A 1... A M ] = A A T (2) with A j the different fuzzy sets of the input LV and we assume that the operation represents the sum-product composition. R B is defined accordingly for the second input. Next step is to calculate the matrices G l that bear the relation between inputs G l = R A [R B E l ] (3) where is the Kronecker tensor product, and R A and R B the activation matrices of each input. E l is a column selection vector, i.e., a column vector with all entries zero but the one at row l, the value of which is unity. This vector has the purpose of extracting column l from matrix R B. The rule base of the system is coded in matrix C. This is a selection matrix with as many rows as rules in the rule base and, for row j, all the entries are zero but the one at column i if the output consequent for rule j is D i. (If the output is not just one set, but a membership degree to each of the output sets, instead of 1 s or 0 s, each value will be the membership degree to each set). Finally, the transition matrices can be calculated as Ω l = C T G l l = 1, N (4) The output centroid, if desired, can be calculated from the output vector γ by z c = [c 1 c 2 c L ]γ [1 1 1]γ = ct γ 1 T γ (5) with c the vector of the output set centroids c i and γ as defined in (1). As previously mentioned, this centroid totally coincides with the one from the conventional SAM-FS. 3 If the input X is a crisp value instead of a fuzzy set, matrix R A becomes the identity matrix. 2.2 General Case For the case of a Multiple-input single-output (MISO) fuzzy system the expressions are extended accordingly [5]. The relation among coefficients is now given by: N 1 γ = N 2 i 1 =1 i 2 =1 N F αi 1 1 αi 2 2 αi F F Ω F j=1 i F =1 i β j (6) where α i and β are the input vectors and Ω F j=1 i j are the transition matrices of the system. 3 Matrix Inference in FDTs A sample is represented by a set of features expressed with some descriptive language. Samples used as inputs in FDTs usually have non-numeric features, which make these trees suitable for implementation using FITM. We will suppose that the input features have as attributes terms that can be expressed using natural language. The features are defined by A j, and the values they can take by F jl. Each of these values will have an associated fuzzy set. For simplicity. it will be denoted as F jl aswell. For example, if the third feature is Hair Color : A 3 = {Hair color}, F 31 =light and F 32 =dark. The output set will be Z, and Z i will be the different classes. Light Z1=0.2 Z2=0.8 1 light Z1=0.3 Z2=0.7 2 WEIGHT Middle Low dark Z1=0.7 Z2=0.3 3 Heavy Z1=0.8 Z2=0.2 4 HEIGHT Middle light Z1=0.1 Z2=0.9 Figure 2: FDT of the example 5 dark Z1=0.5 Z2=0.5 6 hight Z1=0.1 Z2=0.9 We will work with the example shown in Fig. 2 with three features: A 1 (height)= {low, middle, high}, A 2 (weight)= {light, middle, heavy} and 7 981
A 3 (hair)= {light, dark}, and two output classes, Z 1 and Z 2. We consider two possible inference methods using transition matrices; this is now explored. 3.1 Direct tree processing For this first method we keep the tree structure in order not to loose the visual understanding of the process. We make use of the activation matrices defined in (2) to carry out the inference in each node. The steps of the algorithm are as follows: 1. Activation matrix R A is created for each feature, according to (2). In our example: 1 h1 0 1 p1 R height = h 1 1 h 2 R hair = p 0 h 2 1 1 1 1 w1 0 R weight = w 1 1 w 2 0 w 2 1 with h i, p i and w i the overlap degree between fuzzy sets. 2. Input feature vectors are defined according to section 2: β1 α1 h = w = p = β 2 β 3 γ 1 γ 2 γ 3 α 2 (h for height, w for weight and p for hair). For example, if an input sample is h = low, w = heavy and p = dark the input vectors will be h = [1, 0, 0] T, w = [0, 0, 1] T and p = [0, 1] T. 3. The output of each node is calculated by multiplying matrices R A and the feature vectors: [ ] T β 1 β 2 β 3 = Rheight h Output vector must be normalized by its maximum component to balance the weight of each branch in the whole process. 4. The values obtained are brought to the decision tree, as it is shown in Fig. 3 (only some results are depicted). To get the output value of each leave you just have to multiply the values in all the branches from the root to that leaf. WEIGHT γ 1 γ 2 α 1 β 1 α 2 Z 1 = 0.3α 1 γ 2 β 1 Z 2 = 0.7α 1 γ 2 β 1 γ 3 HEIGHT β 2 β 3 α 1 α 2 Z 1 = 0.1α 1 β 2 Z 2 = 0.9α 1 β 2 Z 1 = 0.1β 3 Z 2 = 0.9β 3 Figure 3: Matrix direct tree processing example. Hence, all the fuzzy inference is replaced by simple matrix multiplication. When the decision tree used is dense, the operation saving is considerable. 3.2 FITM processing Once the decision tree is created (and properly tested) and if it is going to be used in a real application, it is no longer needed to maintain the tree structure. In this section we will propose a method to compress the tree, losing interpretability by gaining compactness, speed and computational saving. The algorithm is as follows: 1. Matrices R A are created as before. 2. Composition matrices are created as in (3). In our example G ij = R height [(R weight (R hair E j )) E i ] i = 1, 2, 3 and j = 1, 2. 3. Construction of transition matrices Ω ij. In FITM, in order to build these matrices we need to define a rule matrix C out of a complete rule base. But usually a decision tree is equivalent to a fuzzy system with an implicit rule base. The one of the example is shown in Table. 1. Only 7 out of the 18 possible rules (3 3 2) are present. Rule 2, for example is a complete rule; it takes on values for all the features. On the other hand, rule 7 takes on values only for the first feature. The blanks for the second and the third means any value. The rule base is completed in fact, but some of its values are implicit. There are 982
Rule Height Weight Hair Z 1 Z 2 1 Low Light 0.2 0.8 2 Low Middle Light 0.3 0.7 3 Low Middle Dark 0.7 0.3 4 Low Heavy 0.8 0.2 5 Middle Light 0.1 0.9 6 Middle Dark 0.5 0.5 7 High 0.1 0.9 Table 1: Implicit rule base two equivalent methods to create transition matrices from an implicit rule base: (a) To specify the full rule base. The idea is to fill the blanks in the rule base with all the possible values. In our example, rule 1 would be extended to: Rule Height Weight Hair Z 1 Z 2 1a Low Light Light 0.2 0.8 1b Low Light Dark 0.2 0.8 Extending all the rules, we come up with a 18-rule base 4. From this base we may define matrix C C = 0.2 0.8 1a 0.2 0.8 1b 0.3 0.7 2 0.1 0.9 7b 0.1 0.9 7c and then the transition matrices would be Ω ij = C T G ij. (b) Compression of composition matrices. Matrix C is created according to the rule base. In our example (from Table 1): C = [ 0.2 0.3 0.7 0.8 0.1 0.5 0.1 0.8 0.7 0.3 0.2 0.9 0.5 0.9 Proceeding this way there would be a discrepancy between sizes of matrices G ij and matrix C. Instead of replicating lines in the rule base, now we merge rows in matrices G ij. To do so, we must first understand the meaning of these matrices. In our example, row 1 of matrix G ij is related to the inputs {low,light,light} and row 2 to {low,light,dark}. Both values are implicit in rule 1 of table 1: {low,light,any}={low,light,(light dark)}. 4 The problem of making an implicit rule explicit is briefly studied in section 4. T ] T It is easy to prove that this union operation, if carried out by adding rows 1 and 2 in matrix G ij is totally equivalent to the extension of rules in the base proposed for the previous method. Applying this reasoning to all the rows: G ij = G ij(1) + G ij(2) G ij(3) G ij(4) G ij(5) + G ij(6) G ij(7) + G ij(9) + G ij(11) G ij(8) + G ij(10) + G ij(12) 18 k=13 Gij(k) being G ij (k) the k-th row of matrix G ij. Transition matrices are now calculated Ω ij = C T G ij 4. Output values of the whole tree are calculated using FITM: Z1 3 2 β 1 = γ Z i α j Ω ij β 2 (7) 2 i=1 j=1 β 3 4 About making rules explicit Suppose a 2-input 1-output FS, with A i (i = 1, M), B j (i = 1, N) the fuzzy sets of the input spaces and C k (k = 1, L) the fuzzy sets of the output space. This system will have a rule base with if-then rules such as If X is A i and Y is B j then Z is C k An implicit rule (for a 2ISO system) is such as If X is A i then Z is C k and it must be understood as If X is A i and Y is any then Z is C k. The rule base can be made explicit by changing adding all the possible values of B j : If X is A i and Y is B 1 then Z is C k. If X is A i and Y is B N then Z is C k The output of the system will be the same using the implicit or the explicit sure set if max-min is used. But if SAM is used the output will not be the same. For the SAM equation ωj A j (X)B r(j) (Y )C p(j) Z = ωj A j (X)B r(j) (Y ) 983
the implicit rule base has a term such as C k A i (X) and the explicit one C k A i (X)(B 1 (Y ) + + B N (Y )). So, an implicit rule in a SAM system is not equal to an explicit one, in fact it has a lower weight over the final result. This problem can be solved by adding a weight ω i = j B j(y ) to that rule. Note that this can also affect fuzzy decision trees. If linear operators are used, there can be some effects derived from the density of the tree. Leaves reached after a larger number of nodes can have a stronger weight than the ones reached after a small number of them. The completion of the rule base done in section 3.2 for FITM method indirectly adjusts in the best way the different weights of each branch output. 5 Conclusions A new way to work with fuzzy decision trees is introduced. The FITM method is used to carry out the inference in existing trees in two possible ways. The first one keeps the tree structure unaltered and the second makes a fusion of the tree information in a compact array of transition matrices. The key of the method is the fact that a lot of operations can be precomputed off-line to obtain the transition matrices, so actual inferences are reduced to a few on-line matrix additions and multiplications. FITM method can also avoid some weighting effects that can appear in FDT as a product of the implicit rules and the linear operators. Acknowledgments The authors acknowledge the Comisión Interministerial de Ciencia y Tecnología for research grants TIC2001-3808-C02-02 and TEC2004-3808-C03-01 and the European Commission for the funds associated to the Network of Excellence SIMILAR (FP6-507609) References [1] A. Suárez and J. Lutsko, Globally optimal fuzzy decision trees, IEEE Trans. Pattern Anal. Mach. Intell., no. 12, pp. 1297 1311, Dec. 1999. [2] C. Janikow, Fuzzy decision trees: issues and methods, IEEE Trans. on System, Man and Cybernetics - Part B: Cybernetics, vol. 28, no. 1, pp. 1 14, Feb. 1998. [3] M. Umano, H. Okamoto, I. Hatono, and H. Tamura, Fuzzy decision trees by fuzzy ID3 algorithm and its application to diagnosis systems, in Proc. of FUZZ-IEEE 94, Orlando, FL, USA, June 1994, pp. 2113 2118. [4] M. Dong and R. Kothari, Look-ahead based fuzzy decision tree induction, IEEE Trans. Fuzzy Systems, no. 3, pp. 461 468, June 2001. [5] S. Aja-Fernández and C. Alberola-López, Fast inference in SAM fuzzy systems using transition matrices, IEEE Trans. Fuzzy Systems, vol. 12, no. 2, pp. 170 182, Apr. 2004. [6] L. A. Zadeh, Fuzzy logic = computing with words, IEEE Trans. Fuzzy Systems, vol. 4, no. 2, pp. 103 111, May 1996. [7] S. Aja-Fernández and C. Alberola-López, Fuzzy hierarchical systems wih FITM, in Proc. of FUZZ-IEEE 04, Budapest, Hungary, July 2004. [8] B. Kosko, Fuzzy Engineering. New Jersey: Prentice-Hall International, 1997. [9] S. Aja-Fernández and C. Alberola-López, Fast inference using transition matrices: An extension to non-linear operators, IEEE Trans. Fuzzy Systems,, In press. [10] S. Aja-Fernández and C. Alberola-López, Inference with fuzzy granules for computing with words: A practical viewpoint, in Proc. of FUZZ-IEEE 03, St. Louis, MO, May 2003, pp. 566 571. [11] S. Aja-Fernández, C. Alberola-López, and G. Cybenko, A fuzzy MHT algorithm applied to text-based information tracking, IEEE Trans. Fuzzy Systems, vol. 10, no. 3, pp. 360 374, June 2002. [12] G. Klir and B. Yuan, Fuzzy Sets and Fuzzy Logic. New Jersey: Prentice-Hall International, 1995. 984