MultiGrid-Based Fuzzy Systems for Function Approximation

MultiGrid-Based Fuzzy Systems for Function Approximation Luis Javier Herrera 1,Héctor Pomares 1, Ignacio Rojas 1, Olga Valenzuela 2, and Mohammed Awad 1 1 University of Granada, Department of Computer Architecture and Technology, E.T.S. Computer Engineering, 18071 Granada, Spain http://atc.ugr.es 2 University of Granada, Department of Applied Mathematics, Science faculty, Granada, Spain Abstract. In this paper we make use of a modified Grid Based Fuzzy System architecture, which may provide an exponential reduction in the number of rules needed. We also introduce an algorithm that automatically, from a set of given I/O training points, is able to determine the pseudo-optimal architecture proposed as well as the optimal parameters needed (number and position of membership functions and fuzzy rule consequents). The suitability of the algorithm and the improvement in both performance and efficiency obtained are shown in an example. 1 Introduction The estimation of an unknown model from a set of input/output data is a crucial problem for a number of scientific and engineering areas where lots of research efforts have been employed on. The objective is to obtain a model from which to obtain the expected output given any new input data. Regression or function approximation problems deal with continuous input/output data while classification problems deal with discrete, categorical output data. In this paper we are concerned with function approximation problems in which we want to obtain the model that approximates better the desired continuous output given any input data. Several authors have worked with fuzzy systems to deal with the problem of function approximation. One of the first studies in this context was carried out by Wang and Mendel [1] presenting a general method for combining numerical and linguistic information into a fuzzy rule-table. A procedure was proposed in which each datum generates a rule, though this approach produces an enormous number of rules when the input data set is considerable. Other approaches have also attempted to solve function approximation problems by means of clustering techniques [2,9]. In general, two main approaches might be taken for the partitioning of the input space: On the one hand, the use of fuzzy clusters (see fig 1a)) performs a marginal subdivision of the input space depending obviously on the number of rules taken to reach the objective. This approach has the disadvantage that the whole input R. Monroy et al. (Eds.): MICAI 2004, LNAI 2972, pp. 252 261, 2004. c Springer-Verlag Berlin Heidelberg 2004

MultiGrid-Based Fuzzy Systems for Function Approximation 253 Fig. 1. a) Clustering Techniques for function approximation. b) Grid techniques for function approximation space might not be covered properly. Some input space regions might be kept uncovered by any rule. Besides, the use of clustering for function approximation problems generally does not take into account the interpolation properties of the approximator system [3]. On the other hand, Grid-Based Fuzzy Systems (see fig 1b)) provide a thorough coverage of the whole input space which makes them especially well-suited for low-dimension function approximation problems. Several previous works and papers have shown the great performance that might be reached using this kind of partitioning of the input space. Nevertheless in this last approach, the number of rules used by the fuzzy system increases exponentially with the number of input variables and with the number of membership functions per variable. This increase derives in a loss of effectiveness and in a loss of one of the main properties of the fuzzy systems, the understanding and interpretability of the system. In this paper we use a very simple structure to overcome the problem of the curse of dimensionality for Grid-Based Fuzzy Systems. Apart from presenting this simple and convenient sort of fuzzy systems, we also will provide an algorithm that, when possible, will select the group of variables that will form each sub-grid, resulting to a MultiGrid structure. Also once we know the hardstructure of our multigrid system, we will provide an adaptative algorithm to select the optimal parameters and fine-structure of the system, to obtain the final optimal function approximator for the given data set. 2 MultiGrid-Based Fuzzy System (MGFS) Architecture When we have a high number of input variables, a N-dimensional grid might seem useless for our aim of obtaining an approximation of the input points, since having too many rules as well as too many antecedents on each rule, results in

254 L.J. Herrera et al. an incomprehensible huge model. Also managing so many parameters may reach an efficiency bottleneck, resulting impossible to optimize. Now considering a high dimensional space, we propose to use systems in the form [4]: Fig. 2. MultiGrid-Based Fuzzy System (MGFS) Each group of variables are used to define a Grid-Based Fuzzy System (GBFS) from which a set of rules is obtained in the form [5]: IF x 1 is X i1 1 AND... AND x N is X i N N THEN R p i = R i 1i 2...i N (1) being R p i the i th rule of the p th GBFS. Thus, all the rules from all the GBFS form the whole MGFS, whose output is obtained by normalizing according to the number of GBFS. Therefore the final output of the system for any input value x =(x 1,x 2,...,x N ), can be expressed as follows: F ( x,mf,r,c)= P R p R p N p j p=1 j=1 m=1 P R p N p p=1 j=1 m=1 µ jp m(x m ) µ jp m(x m ) (2)

MultiGrid-Based Fuzzy Systems for Function Approximation 255 a) b) c) Fig. 3. MGFS Different topologies a) one simple topology considers one GBFS per variable, therefore having simple rules with only one antecedent, one for each membership function of each variable. b) a high number of more complex topologies might take place, here we present a two-gbfs topology; the first GBFS has variables x 1, x 2 while the second GBFS has variables x 2, x 3, x 4. See how one single variable might appear in several GBFS with different membership functions distribution. c) the most expensive topology has a single GBFS for all the variables. The number of rules here might be too high in terms of interpretability and efficiency. where explicit statement is made on the dependency of the output function with the structure of membership functions of the system MF, with the consequents of the whole set of rules R, and with the hard structure of the system C = {{ { { x 1 1,x 1 2,...xN1} 1, x 2 1,x 2 2,...xN2} 2,..., x P 1,x P 2,...xNp}} P, i.e, the input variables entering each individual GBFS. Several architecture forms are therefore possible for any given problem with a set of input variables (see Fig.3). The simplest case is when each variable forms a single set (maybe some variables are even not present if they don t have influence on the output of the system), then each rule on each set of variables has a single antecedent. Many more complex configurations are possible for all the combinations (permutations on the number of input variables) until keeping only one set of the whole number of input variables, that is the case of having a (single) grid based fuzzy system (GBFS).

256 L.J. Herrera et al. Now that we have an architecture that, when possible, might reduce the number of rules exponentially, we will study how we can calculate the subjacent data model structure to group the variables and form the optimal MultiGrid- Based Fuzzy System (MGFS). 3 Hard-Structure Identification In this section we present a very effective algorithm to determine the GBFSs that will comprise the system hard-structure, the final MGFS, as shown in Fig.2. Notice the high difficulty to guess the GBFSs that could form the structure of the system. For 4 variables for example, 4 GBFSs of 1 variable + 6 GBFSs of two variables + 4 GBFSs of three variables (+1 GBFS of one variable) are candidate elements to form the structure. Now we would have to choose from every possible grouping of these 15 GBFSs, which one perform best with the less number of rules possible to form the final MGFS, giving thousands of possible combinations even for this simple problem. To tackle this problem, a Top-Down algorithm is presented now. It starts from a whole, complete and effective, grid fuzzy system, proceeding to decrease its complexity step by step while possible. Then it goes step by step, building a simpler MGFS each time, leaving this optimal number of membership functions per variable, and recalculating the consequents of new rules. On each step of the algorithm, if the error obtained is similar to the more complete GBFS previous one, we will take the new simpler configuration as the chosen one. Similarly here we mean that the error does not increase over a tolerance level. If the error obtained is higher, another alternative will be chosen. This will go working until no simpler GBFS can be obtained without keeping the error level. The detailed algorithm is presented now: Top-Down Algorithm 1: Initialize the fuzzy system with a complete grid, setting optimal number of membership functions per input variable. 2: while further steps can be performed do 3: NumberOfGroups = the number of variable groups in this moment 4: for I = 1:NumberOfGroups do 5: Decompose the group I into all the possibilities having one variable less and add them temporary to the group of variables, taking away the group I 6: Take away temporary any groups included in other one bigger. 7: Evaluate the system configuration and take the overall error. 8: if the new error < = previous error + tolerance. then 9: Make definitive the previous decomposition 10: for J = 1:NumberOfNewAddedSubGroups do 11: Take away the subgroup J temporary 12: Take away temporary any groups included in other one bigger

MultiGrid-Based Fuzzy Systems for Function Approximation 257 13: Evaluate the system configuration and take the overall error. 14: if The new error previous error + tolerance then 15: Make definitive the elimination of subgroup J. 16: else 17: Undo the previous elimination of subgroup J. 18: end if 19: end for 20: else 21: Undo the previous decomposition. 22: end if 23: end for 24: end while 25: Return the Final optimal configuration for the MultiGrid-Based Fuzzy System. In steps 7 and 13 we EVALUATE by optimizing the consequents and evaluating the error using the data set of input/output points. The system configuration is obtained by taking the number of membership functions per input variable, setting the membership functions equally-distributed on each variable input domain and forming the rules for each sub-grid. Considering the data set D, we perform a Least Square Error (LSE) algorithm to optimize the rules consequents [6]. The well-known expression for the square error given, the data set D, the distribution of membership functions MF, the rule consequents R, and the MGFS configuration C, is: J(D,MF,R,C)= x D (y k F ( x,mf,r,c)) 2 (3) Differentiating J over each rule consequent give us a lineal equations system with R parameters and R equations. This procedure to calculate the rule consequents is independent of the form and distribution of the membership functions. Singular Value Decomposition (SVD) will be the method used to solve the equations system [7]. Due to the high redundancy that might appear in the system equations matrix, this method suits fine for our problem. Once the rule consequents have been optimally calculated, the error of the MGFS will be measured using the Normalized Root-Mean-Square Error (NRMSE) [6]. Alter applying the whole algorithm we will have the pseudo-optimal structure of groups of variables. Now it remains to perform a final parameter tuning to have the system completely fitted according to the dataset D. 4 Parameter Tuning Now that we have the final MultiGrid structure, now let s perform the parameter adjustment so that the error is completely minimized for a given membership function configuration.

258 L.J. Herrera et al. For this, we will use a triangular partition configuration [5,9] and make use of the method presented in [6]. We have already explained how to calculate the consequents of the rules for a given MF configuration (number and position of the membership functions). Now let s study how we can set the position of the centers of the MFs. A two-step algorithm is performed to optimize the position of the centers of the MFs. First an initialization is done to set the centers to pseudo-optimal values through a heuristic that we ll explain now. Secondly a gradient-based methodology is performed to obtain the local minimum for the given initial configuration. The first step is an iterative process with another two phases for calculating a slope parameter for each center and adjust the centers. The objective of this step is to have at each side of each membership function the same amount of error according to the dataset D. In each iteration, for each center c im m we calculate the value p im m : e 2 ( x k ) e 2 ( x k ) (4) p im m = 1 σy 2 [ xm k D c i m 1 m,c im m ] [ xm k D c im m,ci m+1 m A positive value of the parameter p im m means that the contribution of the left side of the MF to the error is higher than the right side one; therefore we would have to move the center of the MF to the left, and vice versa. Afterwards we perform the following movement of the centers: c i m 1 m c im m p im m b, if p im p im c im m + 1 m 0 T m = m im (5) c i m+1 m c im m b p im m, if p im p im m + 1 m < 0 T m im Here b is the active radius, which is the maximum variation distance and is used to guarantee that the order of the membership function location remains unchanged (a typical value is b=2); Tm im is the temperature which indicates how far the center will be moved. This step of the algorithm will work iteratively moving the centers until a balance takes place in the error on each side of each center. The last step is to find a local minimum from this initial configuration. For this purpose, it can be chosen any of the gradient-based methods that we can encounter in the literature (steepest descent, conjugate gradient, Levenberg- Marquardt algorithm, etc.) Now we have described a tool that for a given MultiGrid configuration and for a given membership function configuration, allows us to find pseudo-optimal parameter values for the whole MGFS. But, as in [8], here we will go one step further and try to optimize the number of membership functions associated with each input variable of each MGBS that forms the MGFS. This is the task that is accomplished in the next section. ]

MultiGrid-Based Fuzzy Systems for Function Approximation 259 5 Fine-Structure Identification We have explained two important phases for our approach for function approximation. The MultiGrid structure algorithm has been presented, we have the key to reduce in many cases the complexity of our system exponentially. A parameter adjustment algorithm has also been obtained for a given MultiGrid structure and a fixed number of membership functions per input variable. Now let s explain how we can adjust the number of membership functions per input variable according to a final error objective. Too complex systems might be useless though also give much less error, and too simple systems might not perform well enough. The algorithm we explain here will give us the last key to obtain the system that fits best to the NRMSE we want for our system with the less complexity possible. This part of the whole algorithm will work together with parameter tuning to try to obtain the simpler but more effective system according to a limit in the error that we will impose. The idea is to begin from a topology where all the variables in all the sub-grids begin for example with two membership functions per input variable. The parameter identification sub-algorithm is performed to check if the system in this moment fits the error goal. Step by step, we check which sub-grid and in which variable adding a new membership function decreases most the error. There we will add a new membership function, and will execute again the parameter identification sub-algorithm to check if the error goal has been passed. 6 Simulations Now that we have all the tools for the whole method for function approximation, let s execute the whole algorithm for a representative example: We will demonstrate how the proposed algorithm works with the following example taken from the literature [4,8]. F (x 1,x 2,x 3,x 4 )=10sin (πx 1 x 2 )+0x 3 +5x 4 + ξ with x 1,x 2,x 3,x 4, [0, 1] (6) We have 10.000 training points generated using this function and we will introduce a random error ξ with variance 0.1. We will first evaluate how the structure would be detected. The initial configuration will be a whole grid fuzzy system having 5 membership functions for each variable. We show in table 1, the steps that the algorithm would follow. Now from this execution we see how the algorithm goes discarding most of the possibilities, taking on each step the only possible configuration according to the stability of the training error. Notice that in fact, from the thousand of possibilities to form a MGFS for 4 input variables, only less than 15 possibilities need to be tested following the algorithm steps to get the optimal MGFS configuration. A final configuration having one grid of two variables (x 1 and x 2 ) and one grid of one variable (x 4 ) is taken for parameter adjustment.

260 L.J. Herrera et al. Table 1. Trace and results for the Top-Down algorithm for the example Step of the algorithm Variable Groups NRMSE 1 {1, 2, 3, 4} 0.025 2, 4, 5 {2, 3, 4}, {1, 3, 4}, {1, 2, 4}, {1, 2, 3} 0.025 9, 11 {1, 3, 4}, {1, 2, 4}, {1, 2, 3} 0.025 15, 11 {1, 2, 4}, {1, 2, 3} 0.025 15, 11 {1, 2, 3} 0.406 17, 11 {1, 2, 4} 0.025 15, 2, 4, 5 {2, 4},{1, 4}, {1, 2} 0.025 9,11 {1, 4}, {1, 2} 0.025 15,11 {1, 2} 0.406 17,11 {1, 4} 0.715 17, 2, 4, 5 {1, 2}, {4} 0.025 9,11 {1, 2} 0.406 17, 4, 5 {4}, {1}, {2} 0.435 21, 2, 25 {1, 2}, {4} 0.025 Notice that this algorithm not only performs the groups selection but also a task of variable selection is accomplished. In the case where any of the variables does not affect the output of the system, it will be immediately detected and discarded, decreasing even more the complexity of our system for a fine parameter tuning, and final interpretability and usability of the resulting MGFS. Next let s check the results for the parameter and fine-structure tuning. Setting a limit of 0.01 for the NRMSE, after applying the whole algorithm, the final number of membership functions needed per input variable is 6 for the variables 1 and 2, (0 for the variable 3 that was already discarded by the algorithm of MGFS selection), and 2 for variable number 4. The algorithm works adding membership functions to the first two input variables and without adding anyone to the variable 4, since as noticed in reference [8], the lineal dependence of this variable is easily identified with two membership functions. 7 Conclusions In this paper we have presented the utility of a MultiGrid-Based Fuzzy System (MGFS) architecture to reduce the complexity of a fuzzy system model when the number of input variables grow up. Besides, it has been presented an algorithm capable of finding a suitable MGBS topology together with the pseudo-optimal parameters defining it, in order to model the underlying system expressed from a set of given I/O data points. As parameters of the MGBS model, it is meant not only the position of the membership functions of every GBFS but also the optimal number of them for a given target accuracy error. Finally, the functioning of the method has been demonstrated through a simple but rather instructive example.

MultiGrid-Based Fuzzy Systems for Function Approximation 261 Acknowledgements. This work has been partially supported by the Spanish CICYT Project DPI2001-3219. References 1. Wang, L.X., Mendel, J.M.: Generating fuzzy rules by learning from examples. IEEE Trans. Syst. Man and Cyber. November/December(1992) 1414 1427 2. Bezdek,J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York, (1981) 3. Gonzalez, J., Pomares, H., Rojas, I., Ortega, J., Prieto, A.: A New Clustering Technique for Function Approximation. IEEE Trans. on Neural Networks, Vol.13, No.1. (2002) 132 142 4. Gunn, S.R., Brown, M., Bossley:Network Performance Assesment for Neurofuzzy Data Modeling. Lect Not. C.S. (1997) 313 323 5. Rojas, I., Pomares, H., Ortega, J., Prieto, A.: Self-Organized Fuzzy System Generation from Training Examples. IEEE Trans. Fuzzy Systems, vol.8, no.1. February (2000) 23 36 6. Pomares, H., Rojas, I., Ortega, J., Prieto, A.: A systematic approach to a selfgenerating fuzzy rule-table for function approximation. IEEE Trans. Syst., Man, Cybern. vol.30. (2000) 431 447 7. Golub, G., Loan, C.V.: Matrix Computations. The Johns Hopkins University Press, Baltimore. (1989) 8. Pomares, H., Rojas, I., Gonzalez, J., Prieto, A.: Structure Identification in Complete Rule-Based Fuzzy Systems. IEEE Trans. Fuzz. Vol.10, no. 3. June (2002) 349 359 9. Ruspini, E.H.: A new approach to Clustering, Info Control, no.15,. (1969) 22 32