AppART + Growing Neural Gas = high performance hybrid neural network for function approximation

Size: px

Start display at page:

Download "AppART + Growing Neural Gas = high performance hybrid neural network for function approximation"

Gillian Benson
5 years ago
Views:

1 1 AppART + Growing Neural Gas = high performance hybrid neural network for function approximation Luis Martí Ý Þ, Alberto Policriti Ý, Luciano García Þ and Raynel Lazo Þ Ý DIMI, Università degli Studi di Udine, Italy. Þ Fac. de Mat. y Comp., Universidad de La Habana, Cuba. Abstract. We combine two notable streams of neural networks research: Adaptive Resonance Theory (ART) and Growing Neural Gas (GNG) networks. In particular we modify the AppART neural network formulation by introducing GNG based training features. The resulting neural network outperforms its original version as well as other neural models while maintaining the functional approximation properties and hybrid system conception. 1 Introduction In this work we propose a neural model that combines two streams of connectionist research that has been recognized as notable achievements of modern artificial neural network theory: Adaptive Resonance Theory and Growing Neural Gas networks. Adaptive Resonance Theory (ART) [1] is a theory of human cognition. ART networks have some features, such as match-based stable learning and intrinsic self-organization, that are appealing for constructing a hybrid (symbolic+connectionist) neural system. The search process involved in the production of a network output in ART networks is also of interest as it is a sort of hypothesis testing mechanism. Some of these features can also sensibly improve the high parameterization problem associated to Multi-Layer Perceptron (MLP) and Radial Basis Functions (RBF) networks [2]. Growing Neural Gas (GNG) [3] networks are intrinsic (growing) self-organizing RBF neural networks based on the Neural Gas [4] model. This models use cumulative errors associated to the input classes to determine where new classes should be inserted. AppART [5] is an ART low parameterized neural model that incrementally approximates continuous-valued multidimensional functions from noisy data using biologically plausible processes. AppART performs a higher-order Nadaraya-Watson [6] regression and can be interpreted as an extension of the fuzzy logic s Standard Additive Model [7]. AppART allows the on-line insertion and extraction of fuzzy if-then rules. It also provides way of justifying network responses. AppART have been proved to have the universal and best approximation properties shared by RBF networks [2]. In this work we modify the original AppART formulation by introducing some GNG based training characteristics. The resulting neural network, which we named GasART, builds classes of similar inputs that have similar outputs. Each class has a desired output value assigned. The output of the network to a given input is the weighted mean of the degree of membership of the input to the classes stored and the desired output values assigned to each class. A match tracking mechanism induces the creation of more specific classes when the prediction of the network differs from the expected output at some degree. GasART keeps the general input propagation dynamics of AppART, while modifies the way new classes are The authors wish to thank the Dipartimento di Matematica e Informatica of the Università degli Studi di Udine for its support on the elaboration of this work. Corr. author: L. M., DIMI, Univ. Udine. v. delle Scienze 208, Udine (UD) Italy; marti@dimi.uniud.it

2 2 L. Martí, A. Policriti, L. García and R. Lazo / AppART+GNG=high performance neural network inserted and the laws that control the way these classes are modified to fit the complexity of the training data. GasART will be described in the next section. After that, we show the results yielded when solving a well-known functional approximation benchmark problem: the approximation of the Mackey-Glass equation. 2 GasART dynamics and training GasART has a layer of afferent or input nodes, F1, a classification layer, F2, a prediction layer, P, and an output layer O. The F2 layer stores classes of inputs. Its activation is a combined measure of the similarity of the input and the prototype of each class, and the size of the given class. Each class is represented by a Gaussian receptive field. The network output is obtained by propagating the output of the F2 layer through the P and O layers. 2.1 Equations When an input Ü ¾ Ê Ò is presented to the input layer it is propagated to the F2 layer. F2 has Æ nodes, with Æ of them committed. Each committed node models a local density of the input space using Gaussian receptive fields with mean and standard deviation and has an associated cumulative error,. A node is activated if it satisfies the match criterion. That means that the match function, ÜÔ ÈÒ ½ Ü ¾ ¾ µ ¾ ½ Æ (1) must be greater than the F2 vigilance parameter, ¾; according to this, the input strength of a node is computed as ¾ ¼ ÓØÖÛ ¾ ¼ (2) where is a measure of the node a priori activation probability. The activation of each node is then calculated normalizing the node s input strength, Ú ÈÆ (3) Ð½ Ð The prediction and output layers conjointly calculate the prediction of the network. The P layer contains two types of nodes: A and B. There are as many A nodes as features are in the output vector Ý ¾ Ê Ñ and only one B node. A and B nodes calculate its activations, and respectively, as weighted sums of the F2 activation vector Æ «Ú ½ Ñ (4) ½ Æ Ú ½ Nodes on the output layer are connected with one-to-one connections to a corresponding A node and to the B node. Their outputs, Ó ¼ ¼ ÓØÖÛ (5) represent the most probable output value, Ó Üµ, expected after an input Ü have been presented.

3 L. Martí, A. Policriti, L. García and R. Lazo / AppART+GNG=high performance neural network Error handling and match tracking After the presentation of an input, if no F2 node is active an uncommitted node must be committed. The task of detecting when an input is not sufficiently coded in F2 is accomplished by the F2 gain control, ¾, that fires if no committed nodes are active. The signal ¾ ½ ÑÜ½Æ Ú ¼ ¼ ÓØÖÛ (6) is used to commit an uncommitted node. It can also be used to offer an I don t know answer during the non-adaptive use of a network. The E node computes the prediction error,, measuring it as an absolute error, Ó Ý (7) Incorrect predictions are detected by the output gain control, Ç. Ç compares the error measurement produced by the E node with a output error vigilance parameter, Ó, Ç ½ Ó ¼ ÓØÖÛ Ó ¼ (8) If Ç remains inactive learning takes place in F2 and P. If Ç fires then a match tracking mechanism takes care of rising the F2 vigilance from its base value ¾ that is the minimum vigilance accepted. The purpose is to reset currently active categories that might be interfering in the calculation of an accurate prediction. A straight through approach could consist in raising the vigilance value as the minimum activation, ¾ therefore deactivating the less active F2 node. 2.3 Learning ÑÒ Ú (9) ½Æ The learning rule used in F2 is based on the gated steepest descent learning rule [1]. First, the prediction errors obtained in this input presentation are added to the cumulative error of the nodes that had interfered in the production of the prediction. The amount of error added is modulated by the node activation, Ú, Ø ½µ Øµ Ú (10) After this, the cumulative category activation,, is updated as Ø ½µ Øµ Ú (11) in order to represent the amount of training that has taken place in the th node. The use of equally weights inputs over time with the intention to measure their sample statistics. Among all active nodes the best-matching node (BMN) is determined as the one with maximum activation, Ú A second best-matching node (SBMN) is selected, if its not bound with the BMN a link between them is established with ¼ If the link already exists its age is reset to zero. The center of the BMN receptive field is updated by allowing ÅÆ Ø ½µ ½ ½ ÅÆ Ú ÅÆ ÅÆ Øµ ½ ÅÆ Ú ÅÆ Ü (12)

4 4 L. Martí, A. Policriti, L. García and R. Lazo / AppART+GNG=high performance neural network All nodes bound to the BMN also modify their centers Ø ½µ ½ ½ Ú Øµ ½ Ú Ü (13) but with a rate, ¼ ½ that controls the amount of change. After this, the age of the links emanating from the BMN is incremented. When a node is committed and after a training iteration, the standard deviations should be recomputed. They are set to the mean distance between the center of the node and the centers of the nodes bound it [3]. In the P layer «is adapted to represent the corresponding cumulative expected output learned by each A node: «Ø ½µ «Øµ ½ Ú Ý (14) where ¼ is a small constant. The weights of the B node, are updated in a similar way but tracking the amount of learning that have taken place in each F2 node. Ø ½µ Øµ ½ Ú (15) 2.4 Creation of input classes GasART is initialized with two nodes committed Æ ¾µ. The centers of these nodes could be either initialized to the first two inputs in the training set or to random values. Each node is bounded with each other with a link that indicates its topological vicinity. This bind has an age,, that is initialized to ¼ when a new bind is set up. A link between two nodes is removed if its age, is bigger than a certain ÑÜ. If the signal ¾ fires, then an uncommitted node is committed and Æ is incremented. The newly committed node is indexed by Æ and initialized with Ú Æ ½, Æ ¼ and Æ ¼. The two units with larger accumulated error are binded to the node Æ. Subsequently learning will proceed as normal. 3 Mackey-Glass equation approximation We now focus on the problem of approximation the Mackey-Glass equation. We aim to study GasART performance as a function approximating algorithm and comparing it with other neural models like MLP, RBF and the original AppART. The Mackey-Glass equation, Ü Ø Ü Ø µ ½ Ü Ø µ Ü Øµ (16) is a time-delay differential equation that has been proposed as a model of white blood cell production [8]. The constant values are commonly set to ¼¾, ¼½ and ½¼. The delay parameter determines the behavior of the system. For ½ the system produces a chaotic attractor. For our simulations we have chosen ¼ and an input window of 6 time steps elements. A set of 4000 items was generated, 3200 were used as training set and the remaining 800 we used as test set. When applying AppART and GasART a voting strategy is used, with the number of voting networks was set to a 10 per cent of the size of the training set. In the case of MLP a backpropagation algorithm was used for fitting the network. For RBF network a hybrid training method (unsupervised in the hidden layer and supervised in the output layer) was used 1. 1 Both of these training techniques are broadly discussed else where, c.f. [2])

5 L. Martí, A. Policriti, L. García and R. Lazo / AppART+GNG=high performance neural network 5 Table 1: Mean square errors of the predictions of the Multi-Layer Perceptron (MLP), Radial Basis Functions network (RBF), AppART and GasART using the Mackey-Glass problem test set. Neural model Mean Square Error Nodes needed MLP RBF AppART GasART The results obtained 2 in table 1 clearly show that GasART not only makes a more accurate prediction by two orders of magnitude than AppART but also commits fewer nodes thus creating a more compact input space representation. 4 Concluding remarks We have presented GasART. GasART combines ART and GNG networks. It modifies the learning laws of AppART to improve its accuracy and performance while keeping its main characteristics. Therefore, GasART incrementally approximates any function with any degree of accuracy from noisy training samples. In the benchmark tests carried out AppART outperformed neural models tested, generating compact knowledge representations. Because of the briefness of this communication we did not extended ourselves into solving more problems or testing other models, however more tests are needed. We are currently studying alternative ways of for making the merging of ART and GNG models even tighter and starting to apply GasART in bioinformatics problems. References [1] S. Grossberg, Studies of Mind and Brain: Neural Principles of Learning, Perception, Development, Cognition, and Motor Control. Boston: Reidel, [2] C. M. Bishop, Neural Networks for Pattern Recognition. Oxford: Clarendon Press, [3] B. Fritzke, Fast learning with incremental RBF networks, Neural Processing Letters, vol. 1, pp. 2 5, [4] T. M. Martinetz, S. G. Berkovich, and K. J. Shulten, Neural-Gas network for vector quantization and its application to time-series prediction, IEEE Transactions on Neural Networks, vol. 4, pp , [5] L. Martí, A. Policriti, and L. García, AppART: An ART hybrid stable learning neural network for universal function approximation, in Hybrid Information Systems (A. Abraham and M. Koeppen, eds.), (Heidelberg), pp , Physica Verlag, [6] E. A. Nadaraya, On estimating regression, Theory of Probability and Its Application, vol. 10, pp , [7] B. Kosko, Fuzzy Engineering. New York: Prentice Hall, [8] M. C. Mackey and L. Glass, Oscillation and chaos in physiological control systems, Science, pp , The results that do not have to deal with GasART are taken from [5]. In this work other neural networks were applied, GasART outperforms all of them.

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer