A Compensatory Wavelet Neuron Model

A Compensatory Wavelet Neuron Model Sinha, M., Gupta, M. M. and Nikiforuk, P.N Intelligent Systems Research Laboratory College of Engineering, University of Saskatchewan Saskatoon, SK, S7N 5A9, CANADA guptam@ sask.usask.ca Abstract This paper proposes a compensatory-wavelet neural model, which is based on a wavelet activation function. Here, the basis function comprises of both summation and multiplicative functions. It is shown in [l] that, for a spectrum of functional mapping and classification problems, the compensatory neuron based neural network model performs better than the ordinary neuron based neural network, in terms of both the accuracy of prediction and the computational time involved. On thc other hand, the wavelet neuron is obtained by modifying an ordinary neuron with a non-orthogonal wavelet bases [2]. The performances of different neuron based neural networks are also analyzed in this paper. 2 Neuron Models The neuron model affects the classification and the functional mapping power of a neural'network. In the following sections we investigate different existing neuron models and formulate some new neuron models to improve upon the capability of the existing ones. Basic Neuron Model: The neuron model due to Mc- Culloch and Pitts is given by the Equations (I), (2) and (3) as described below. N u = c wixi i=o 1 Introduction A robust performance and quick convergence of a neural network (NN) with small complexity are vital for its wide application. The architectural complexity, which governs the size of a NN, depends on the number of neurons and the connections [3J. The larger the number of neurons and connections, the more complex will be the architecture. Similarly, the learning complexity depends on the learning algorithm. Any NN designed for the real-life applications must not be complex and it must have adequate functional mapping, classification and generalization capabilitics. The present investigation explores the feasibility of constructing higher order neuron models which may actually serve as the basis for the formulation of some powerful neural network architectures. Some benchmark classification and functional mapping problems are addressed to validate the neuron and neural network models dcvclopeil and reported in this papcr. It will be shown that even simple feedforward neural network can predict a chaotic nonlinear time series problem contrary to the conclusion drawn by Yamakawa et a1 [2]. 4(u) = Y = 4 (U) y (exp-xu/2 - expx"/2) (exp-x"/2 + expxu/2 1 where X is a steepness factor and y is a multiplication factor. (3) Compensatory Neuron Model: Sinha et a1 [I] proposed a compensatory neuron model where each neuron consists of two nonlinearities. Here, we propose a compensatory neuron model with one nonlinearity as shown in Fig. 1. This forms the basis for formulating the compensatory neural network architecture (CNNA) as shown in Fig. 4. This not only reduces the number of neurons required to solve some of the benchmark classification and mapping problems, but also improves the convergence speed and reduces the computational burden. The compensatory neuron model can be expressed as N M N 0-7803-7078-3/OU$l0.00 (C)U#)l IEEE. Page: 1372

where d, (U) is defined in Equation (3). \ 'W I.." / / I.' / neuron Y ----t output Figure 1. Compensatory Neuron Model Figure 3. Compensatory Model -Y output Wavelet Neuron Wavelet Neuron Model: Yamakawa et a1 [2] proposed an over-complete system of non-orthogonal smooth wavelet bases in order to approximate a nonlinear function with a smooth function. The shape of the bases can be described by the following set of equations: u = N i=o wixi where 6 is a shifting parameter, the maximum value of which equals the corresponding scaling parameter a. Figure 2 depicts the wavelet neuron model. 3 Formulation of Various Architectures The neuron models described in the previous section can be arranged to form neural network architectures for solving different problems. The architecture based on a basic neuron model is referred to as standard feedforward neural network (STD). The neural network architectures based on compensatory, wavelet, and compensatory wavelet neurons will be termed compensatory neural network architecture (CNNA) (Fig. 4). wavelet neural network architecture (WNNA) (Fig. 5) and compensatory wavelet neural network architecture (CWNNA) respectively. A modified form of STD where only summation function is used in the output layer will be referred to as modified standard neural network (MSTD). I fo \ Y output XIinp neuron Figure 2. Wavelet Neuron Model I L 8 inputs neuron blocks outputs Figure 4. Compensatory Neural Network Architecture Compensatory Wavelet Neuron Model: In the compensatory neuron model if the sigmoid function is replaced by a wavelet function it results in a compensatory wavelet neural model. Figure 3 presents the schematic of the compensatory wavelet neuron model. This model can be defined as given by Equations (6) and (7) where U is defined by Equation (4). 4 Learning Algorithm The learning rules correlate the input and output values of the nodes by adjusting the weights in a neural 0-7803-7078-3/0V$l0.00 (C)u)ol IEEE. Page: 1373

6s = -(4-Oh) {Xs(yL - 0:)/2yk} Hidden layer weights update: W inputs input neurotis output neurons Figure 5. Wavelet Neural Network Architecture Input layer weights update: network. The steepest descent algorithm requires a selection of user-defined parameters sorted out by trial and error and is slow in convergence. The problem of poor convergence is combated using various acceleration techniques mentioned in the literature [4][5]. But most of these techniques are adhoc patching. We adopt here a method termed scaled conjugate gradient learning (SCG) 161 and self-scaling scaled conjugslte gradient learning (SSCG) [I] to train the various neural network models. If in the neural network model, the output layer has a summation function only, then the output layer-weights update can be carried out using either a linear scheme such as matrix inversion, or a singular value decomposition approach or can be carried out using the usual back propagation scheme. If the weights are updated using a backpropagation scheme in conjunction with SCG then this constitutes the SSCG learning [I]. It has been shown that this method gives better accuracy for functional mapping and classification problems[ I]. Below we give the equations for calculating the error gradient for the STD and the CNNA. Expressions for the error gradients described below are for the error function defined by the Equation (9). All the computations are done in off-line mode. The error gradients for all the patterns are obtained by summing up and averaging the error gradients for the individual patterns. Error gradient calculation for the compensatory neural network architecture (CNNA) is as follows: Output layer: Input layer: K E(G(n)) = 0.5(& - (9) k=l Error gradient calculation for the standard feedforward neural network Output layer weights update: 0-7803-7078-3/0u$10.00 (C)u)Ol IEEE. Page: 1374

error at the nth iteration number o f neurons in output layer number of inputs number o f neurons in a layer kth desired output kth actual output input output o f a neuron weights steepness coefficient multiplication factor nth iteration bias 5 Simulation Studies The essential components defining a NN are topology, size, functionality, learning algorithms, traininghalidation, and implementation/realization. The performance measure involves the selection of these features and quantifying, in some form, success of the selection. The result of the performance evaluation will depend significantly on the application. The main factors which will decide the superiority of neuron models, in general, using supervised learning, are the computational burden for each iteration/epoch, number of epochs for convergence, NN size, gcneralizationltcst, and benchmark problems. Here, we analyze first some functional mapping problems and then classification problems. 5.1 Functional Mapping This may involve mapping from a lower dimensional to a higher dimensional system or vice verse. Essentially, the capability of mapping a function depends upon the neuron model and the architecture of the NN together with the learning scheme being used. In the following sections we first test on z(z, y) = sin(z).sin(y) and a nonlinear time series problem. In all the figures depicting the convergence of the NN, the mean square error (mean of the error functioned defined by the Equation (9)) against the number of iterations is plotted. Sin(x).Sin(y) Problem: General function mapping problems have been used by different researchers to test a NN s capabilities, learning algorithm s efficiency, etc. A popular mapping function is Z(Z, y ) = sin(x). sin(y). This function becomes more complex as the norm of the input vector (2, y) grows. We have generated the training set consisting of 2500 training patterns by varying the values of z and y in the range [O, 5~1. A Chaotic Nonlinear Time Series Problem: Here the training and the test sets are generated using the following nonlinear time series equation. 52, z,+1 = - 1+x; 0.5~~ - 0.5~,-1 + O.5xn-2 (22) with the initial values xo = 0.2, z1 = 0.3, and z2 = 1.0. The set consists of 3 inputs and 1 output. The 3 inputs were comprised of 2 delays and 1 present value of thc independent variable. The data set are constructed by deleting the past-past value and adding a new predicted value. A time series of 101 points was used to construct the training data set, consisting of 99 patterns, as explained above. 5.2 Classification Any new neuron model and learning algorithm developed must be tested for its classification capability on benchmark problems. Therefore, to verify the efficacy of the proposed neuron models we examined them on a few classification problems, such as parity and XOR. XOR Problem: The exclusive-or (XOR) problem is the classic problem requiring hidden units. The XOR problem as compared to other logic operations is nonlinearly separable. The NN models were trained for the XOR problem and their performance was analyzed in terms of the number of epochs required and the degree of accuracy achieved. Parity Problem: The N-input parity problem has been a popular benchmark problem among researchers in NN such as Minsky and Papert [7]. The problem consists in mapping an N - bit wide binary number into its parity, i.e. if the input pattern contains an odd number of Is then the panty is I. else it is 0. We have used it to determine the properties of neurons. 6 Results and Discussion The simulation results for the different neuron models based NNs are presented in Figs. 6 to IO. In Fig. 6 the error decay for the sin(z).sin(y) problem during training is presented. The CNNA-4, STD-3-5- I, MSTD-3-5-1, WNNA-36, CWNNA-15 refer to CNNA with 4 neurons, STD with 3-5-1 neuron in the input, hidden and output layers respectively, MSTD with 3-5- 1 neuron in the input, hidden and output layers respectively, WNNA with 36 neurons (generated out of 8 complete 0-7803-7@78-3/Ol/$lO.@l (C)zoOl IEEE. Page: 075

I I Figure 6. M.S. error decay during training for the sin(x).sin(v) problem Figure 8. Prediction error for diflerent NN architecture bases), CWNNA with 15 neurons (generated out of 5 complete bases) respectively. It may be observed that the convergence of the CWNNA and the CNNA were the best. But the number of neurons involved in thc CNNA were only 4 while that for the CWNNA were 15. This resulted in a computation saving, moreover, fewer parameters (weights) were used to approximate the mapping. The STD and the MSTD have equal numbers of neurons, but the convergence for the MSTD is better except for a small interval where convergence were slow in the case of the MSTD. Figs. 7 and 8 present the results for the chaotic time series problem discussed earlicr. convergence but the ultimate convergence was better for the other models. It is to be noted that the error decay for the wavelet models can be made faster and better but only at the cost of increased computation. This is obvious from Fig. 7 that the wavelet models were computationally costly due to large number of weights and neurons as compared to the other models. A further increase in the number of neurons would make these wavelet models even costlier. The prediction was best for the CNNA which was of course computationally cheaper than the wavelet models and the STD and the MSTD. Similarly, it can be observed that for the classification problems (XOR and Parity) the performance of the compensatory wavelet model was at par with that of the wavelet model while the amount of computation involved in the former was less than that for the latter (Figs. 9 and IO). The compensatory neural network performed best while the amount of computation involved was the least. Figure 7. M.S.error decay during training for STD, MSTD, CNNA, WNNA, CWNNA for the tame series problem.., CNNA-I CWNNA-6 WNNA-6 5TD-2-1-1 MSTD-2-1-1. -- Here the legends have the same meaning as explained earlier. The CWNNA-2 I was generated out of 6 complete wavelet bases while WNNA-28 was generated out of 7 complete wavelet bases. It can be observed that the convergence for the STD was better than both the wavelet models. This conclusion is contrary to the conclusion drawn earlier by Yamakawa et a1 [2]. This is because of the fact that STD model does not require as many neurons as was used to solve this problem by Yamakawa et a1 [2]. The wavelet models showed early Figure 9. M.S. error decay during training for the XOR problem 0-7803-7078-3/0l/$l0.00 (C)2001 IEEJL Page: U76

[6] Molter, M. E, A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning, Neural Networks, Vol. 6, 1993, pp. 525-533. [7] Minsky, M. L., Papert, S., Perceptrons: An Intro-. duction to Computational Geometry, MIT Press, Cambridgc, MA 1969. Figure 10. M.S. error decay during training for the parity problem 7 Conclusion A compensatory and a wavelet compensatory neuron models were proposed proposed in this paper. These models serve as the basis for the formulation of the compensatory neural network and the compensatory wavelet neural network architectures. It is concluded that the compensatory models are much superior to the other models. Moreover, the modified standard neural network (MSTD) is also much superior to the wavelet model. References Sinha, M., Kumar, K., and Kaka, P. K., Some New Neural Network Architectures with Improved Learning Schemes, to appear in Softcomputing, Springer Verlag. Yamakawa, T., Uchino E. and Samatsu T., Wavelet Neural Network Employing Over- Complete Number of Compactly Supported Non-orthogonal Wavelets and their Applications, in Proceeding of IEEE International Conference on Neural Networks, June 28-July 2, 1994, pp. 139 1-1 396. Hassoun, M. H., Fundamentals of Artificial Neural Networks, MIT Press, Cambridge, Massachusetts, 1995. Jacobs, R. A., Increased Rate of Learning Convergence through Learning Adaptation, Neural Networks, Vol. 1, 1988, pp. 295-307. Hush, D. R., and Salas, J. M., Improving the Learning Rate of Backpropagation with the Gradient Reuse Algorithm, in Proceedings of IEEE International Conference on Neural Networks, Vol. I, 1988, pp. 44 1-447. 0-7803-7078-3/0U$l0.00 (C)ZUOl IEEE. Page: 1377