CHAPTER 5 RADIAL BASIS FUNCTION (RBF) NEURAL NETWORKS FOR TOOL WEAR MONITORING

Size: px

Start display at page:

Download "CHAPTER 5 RADIAL BASIS FUNCTION (RBF) NEURAL NETWORKS FOR TOOL WEAR MONITORING"

Carmella Glenn
5 years ago
Views:

1 CHAPTER 5 RADIAL BASIS FUNCTION (RBF) NEURAL NETWORKS FOR TOOL WEAR MONITORING This chapter presents an overview of radial basis function neural networks and their applications to tool wear monitoring. The center of RBF units have been fixed using three different approaches and their learning characteristics are analysed. The performance of RBF neural networks have been compared with MLP for tool wear monitoring. 5. Radial Basis Function (RBF) neural networks There are different forms of designing a supervised neural network. The backpropagation algorithm for the design of a multi-layer perceptron (under supervision) can be viewed as an application of an optimization method known in statistics as stochastic approximation. Another approach can be viewing the design of a neural network as a curve-fitting problem in a high dimensional space. This involves finding a surface in a multi dimensional space that provides a best fit to the training data, with the criterion for "best fit" being measured in some statistical sense. Correspondingly generalization is equivalent to the use of this multi dimensional surface to interpolate the test data. This is indeed the motivation behind the method of radial-basis fiinctions. In the context of a neural network, the hidden units provide a set of 'functions' that constitute an arbitrary "basis" for the input patterns (vectors) when they are expanded into the hidden unit space. Therefore these functions are called 'radial-basis functions'. Broomhead and Lowe (988) were the first to exploit the use of radial-basis functions in the design of neural networks. Other major contributions to the theory, design and application of RBFs include papers by Moody and Darken (989), Renals (989) and Poggio and Girosi (990) [76]. The main advantages claimed for the RBF model are its simplicity and the ease of implementation. The learning and generalization abilities of these networks are excellent. The construction of a radial basis function network in its most basic form involves three entirely different layers. The input layer is made up of source nodes (sensory imits). The 94

2 second layer is a hidden layer of high enough dimension, which serves a different purpose from that in a multi-layer perceptron. The output layer supplies the response of the network to the activation patterns applied to the input layer. The transformation from the input space to the hidden-unit space is non-linear, whereas the transformation from the hidden-unit space to the output space is linear. Fig. 5. shows the typical RBF architecture. Input layer Hidden layer Output layer Cn Fig. 5. RBF architecture The RBF network is a single hidden-layer feed forward neural network. Each node of the hidden layer has two parameters, a center Xj and a width aj. This center is used to compare with the network input vector to produce a radially symmetrical response. The width controls the smoothness properties of the interpolating function. Response of the hidden layer are scaled by the connection weights of the output layer and then combined to produce the network output. RBFs have been shown to have universal approximation capability. In the classical approach to RBF network implementation, the basis functions are usually chosen as Gaussian and the number of hidden units is fixed a priori based on some properties of the input data. The weights connecting the hidden and output units are estimated by linear 95

3 least squares methods, e.g., least mean square (LMS). The disadvantage with the classical approach is that it is not suitable for sequential learning and it also results usually in too many hidden units. The RBF network requires less computation time for the learning and produces a more compact topology than other neural networks [76]. 5.2 Learning Strategies The learning process undertaken by a radial-basis function (RBF) network may be visualized as follows. The linear weights associated with the output unit(s) of the network tend to evolve on a different 'time scale' compared to the non linear activation functions of the hidden units. Thus as the hidden layer's activation functions evolve slowly in accordance with some non linear optimization strategy, the output layer's weights adjust themselves rapidly through a linear optimization strategy. Since different layers of an RBF network perform different tasks, it is reasonable to separate the optimization of the hidden and output layers of the network by using different techniques and perhaps operating on different time scales (Lowe 99). There are different learning strategies available for the design of an RBF network, depending on how centers of the radial basis functions of the network are specified. Three different learning strategies are discussed below Fixed Centers Selected at Random The simplest approach is to assume fixed radial-basis functions defining the activation functions of the hidden units. Specifically, the locations of the centers may be chosen randomly from the training data set. The RBFs use gaussian activation function which is defined as (j)j (x) = exp(- xj-^i /2a j) where Xj is the center andctjis the width (standard deviation), j=l,2, c, where c is the number of centers. The only parameters that would need to be learned in this approach are the linear weights in the output layer of the network. The weights ate learned using a simple LMS algorithm or the gradient descent approach. 96

4 5.2.2 Self-Organized selection of centers- In this approach, the radial-basis functions are permitted to move the locations of their centers in a self-organized fashion, the weights of the output layer are computed using a supervised learning rule. The network undergoes a hybrid learning process (Moody & Darken, 989; Lippmann,989). The self-organized component of the learning process serves to allocate network resources in a meaningful way by placing the centers of the radial-basis functions in only those regions of the input space where significant data are present. The self-organized selection of the centers is done using clustering algorithms like the batch fuzzy c-means. The next step is the determination of the width parameter of the basis functions Cj which must be chosen using some other procedure. An effective scheme to find the widths is the P-nearest neighbor heuristic (Moody & Darken, 989). Consider a given center vector Xj (j=l,...c) and assume Xji, Xj2,... Xjp (l<=jl, j2,...jp<=c) are the P- nearest neighboring centers. The width of the basis function Oj is given by the RMS distance of the given cluster center Xj to the P nearest neighboring centers: aj = sqrt(l/p2:vill'^j-xjpll')[64]. Another heuristic is to choose all the QJ to be equal. This ensures that the basis functions overlap to some degree and hence give a relatively smooth representation of the distribution of training data [66] Supervised selection of centers- In this approach the centers of the radial-basis functions and all other free parameters of the network undergo a supervised learning process. A gradient-descent procedure is used to accomplish the task. The instantaneous value of the cost function is defined as E = '/2E"i=iE^ where n is the number of training examples used to undertake the learning process and Ej is the error signal defined by Ei = Ok - S\=i Wk exp(- xj-^i p/2a^j) where K is the number of nodes in the output layer, c is the number of centers in the hidden layer 97

5 (j=l,2,...c). The requirement is to find the free parameters Wk, Xj and GJ in order to minimize E [76]. 5.3 Comparison of RBF networks and Multi-layer Perceptrons Radial basis fiinction networks and multi-layer perceptrons play very similar roles in that they both provide techniques for approximating arbitrary non-linear functional mappings between multi-dimensional spaces. Both are examples of non-linear layered feed forward networks and they are universal approximators. The comparison between the two networks is given in the table 5.. Table 5.: Comparison ofmlp and RBF Networks SI. No ML? Has more hidden layers Use monotonic Sigmoidal function Hidden and output layer share a common neuron model Compute iimer products Uses Global error for minimization RBF Has a single hidden layer Use non monotonic Gaussian function Hidden and output layer share different neuron models Compute Euclidean norm Uses local error for minimization 5.4 Training & Testing the RBF network In order to train the RBF neural network, three different methods have been selected for center initialization of the RBF units. We have considered En-8 data set for this and the results obtained have been presented. For the other two data sets namely grey cast iron and En-2A the results obtained have been tabulated Centers of the RBF units initialized randomly (a) Training phase - The input patterns have a dimension of 2 and output is of dimension. The complete training set consists of 69 patterns. Sample training and test patterns is given in Table

6 Table 5.2: Sample Input Patterns (En-B) -^h \ I'li No. -> J> 4 5 Tcsl KDC S S Kisc I'iiiR' X Kins vollai;c 0.977X !.0000 Kiicrjjy Kvfiil duralion , , Mean KiM' Tinii; R K, <\ l'at(o- K/ X<) R OTfWfit) " , Cutting. - Or3'J77 V) M).0 OJiJST ', > :. / ' Fcyd The RBF units for training the network are selected arbitrarily and each unit is assigned an input pattern randomly, thus initializing the centers. The algorithm to train the network is given below. Algorithm:. Select the number of RBF units arbitrarily. 2. Initialize their centers from input data randomly. 3. SetEtot=0. 4. Choose the input-output pair {^\, (^\^, where i = l,2,...n is the number of patterns and i =, 2,... p is the number of input features and k = is the output feature. 5. Compute the hidden layer output, Vj = e" V ^ ^ \ where Xj is the center and GJ is the width of the RBF unit. 6. Compute the output using Ok = / (l+e'^'^kj "^j) 7. Compute the square error E =(0k - Ck) * ( Ok - C,v) and Etot = Etot + E. 8. The change in output layer weights are calculated as follows: 5k=(0k-Ck)* Ok*(l-Ok) Awkj= 5k*Vj*Ti*a w"'=«' kj = w""* kj + Awkj 9. If Etot > Emin, then goto Step Save weights, centers and widths and exit. 99

7 The simulation parameters r =0.85 ana a=u.ud are mamtamed constant for all the studies. The training of the network has been done with different number of RBF units. The widths of the RBF units are determined using P-nearest neighbor heuristic (each RBF unit has different width value) and have been kept constant for all the RBF units and studies have been carried out. The upper limit on the learning cycle has been kept at,75,000 epochs to observe the network convergence behavior. For any parameter setting, if the network takes more number of epochs than the set value, it is considered as non convergent and the network parameters are set to new values and training is restarted. Table 5.3 shows the error reached during the training of RBF network with different RBF units. Table 5.3: Variation of Error with Different Number of RBF Units No. of RBF units Error It is evident that the error decreases with increase in the number of RBF units and it is minimum for 66, beyond which the error increases. Similarly optimum number of RBF units has been determined for grey cast iron and En-2A data set. Table 5.4 shows the optimum number of RBF units for three data sets. Table 5.4: Optimum Number of RBF Units (random selection of centers) Data set En-8 steel Grey cast iron En-2A steel No. of Centers Error

8 Fig. 5.2 shows the variation of error with number of epochs for 66 centers E^och Fig. 5.2 Variation of Eiror witli Epochs (2-66-, En-8, Random selection of centers) It is clear that the error reduces gradually with epochs. Fig. 5.3 shows the final network architecture Input layer Hidden layer Hutput layer w Jk ^n Fig. 5.3 RBF: Final networlc architecture (2-66-) Centers selected randomly These results presented are for varying widths, which have been determined using the P-nearest neighbor heuristic. In another study widths have been kept constant for all the RBF units and the network has been trained for different values of widths. Table 5.5 0

9 shows the error reached during training the RBF network with different values of widths for 66 RBF units. Table 5.5: Variation of Error with Different width Values for 66 RBF Units Width value Error It is clear from the table that the error increased with the increase in the value of the width. The optimum value of width has been chosen as 0.9, based on the performance of the network on both training and test data. The network has also been trained with width = 0.9 for different number of RBF units. It is seen that with the increase in the number of RBF units, the error decreases and the error is minimum when the number of centers is equal to the number of data in the training set. (b) Testing phase: In the testing phase, 5 data samples have been selected. The trained RBF network with different number of RBF units has been tested with these data samples. The network output specifies the condition of the tool in terms offlankwear on the tool. For the sake of analysis of the network output, from the tool wear monitoring point of view, only two states have been considered. Based on the flank wear values, the network output can be interpreted as follows: (i) if (network output <=0.4) the tool is in the 'Normal' condition. (ii) if (network output > 0.4) the tool is in the 'abnormal' condition and hence replace the tool. Table 5.6 shows the results of testing seen and unseen data for different number of RBF units. 02

10 Table 5.6 Performance of REF Network with Random Selection of Centers No. of. RBF units Training accuracy Testing accuracy 94% 80% 96% 97% 97% It is clear from the table that as the number of RBF units increase, the performance of the network improves. Table 5.7 shows the sample testing results for the RBF network with 66 hidden units. Table 5.7: Sample Testing Results (2-66-, En-8, random selection of centers) Pattern No. Desired output (mm) T Network output (mm) raining data Test data Classification error X0-^ O.OOUll XIO-^ XIO"^ X0-^ X0-" Table 5.8 shows the performance of the network for different values of widths for 66 RBF units. 03

11 Table 5.8: Performance of RBF Network, for Different aj (En-8) Width value Oj Training accuracy Testing accuracy 00% 87% 99% 97% 90% 80% It is evident from the table that the overall network performance on training and test data improves with a increase in the width value and is maximum for 0.9, beyond which it reduces. Table 5.9 shows the sample testing results for training and test data for width= Table 5.9: Sample testing results (2-66-, width: 0.9, En-8) Pattern No. Desired output (mm) Ti Network output (mm) 'aining data Test data Classification error XIO' X0-* X0-* The results for all the three data sets are presented in table 5.0 below. Table 5.0: Performance of RBF network for three data sets (Random Selection) Data set En-8 steel Grey cast iron En-2A steel No. of RBF units Error Training accuracy 97% 96% 77% Testing accuracy 93 %! 95 % i 7% 04

12 All the data sets except En-2A data set exhibit good generalization. The poor performance of RBF network on En-2A data set could be due to insufficient number of patterns representing all possible tool wear states. In the second approach, Batch fuzzy- c means algorithm has been used to determine the RBF units Center initialization using Batch fuzzy c-means algorithm In this study, batch flizzy-c means algorithm has been used to initialize centers of the RBF network. The number of RBF units has been fixed arbitrarily. This is a clustering technique, which assigns feature vectors Xj into c clusters, which are represented by prototypes Vj. The certainty of the assignment of the feature vector Xj into various clusters is measured by the membership functions Uj(xi) = uy e [0,], I<=j<=c, which satisfy the property ZVi uy =. The M x c matrix U = [uy] e u is a fuzzy partition in the set u defined as u = { U e R"^' Uy e [0,], V i, j; IVi uy =, Vi; 0< Z^i=i uy < M, Vj}, where x e R" are M feature vectors. This has been used to iteratively select the optimum number of centers for the RBF network. The algorithm is presented below [36]. Algorithm:. Select c, no.of centers, and set m=2, flizziness parameter, Emin and iter = Generate an initial set of prototypes {vi,o, V2,o, Vc,o}. 3. iter = iter + Uij, iter = [ IVl ( Xi-Vj,i,er-l '/ Xi-V,,iter-l P)"^"-'^]-', l<=i<=m ; l<=j<=c, Vj,iter= l''i=l(uij.iterrxi/z''i=l(uij,iterr, l<=j<=c, Eiter = ZVl II ^j,'ter " Vj,iter- 4. If iter < N (maximum number of itera;tions) and Euer > Emin, then goto step 3. The centers determined have been used to train the RBF network. In batch algorithms all the prototypes are updated together. 05

13 (a) Training phase: The training data patterns are described by 2 input features and output feature. Since the cutting conditions have similar values for various patterns, they do not contribute much to the center selection using the batch fuzzy c-means algorithm and have not been included for grey cast iron and En-2A data set. Therefore the number of input features has been reduced to 0 for both these data sets. The simulation parameters T]=0.85 and a=0.05 have been maintained constant for all the experiments. The training of the Batch fuzzy c-means algorithm has been carried out with different number of centers. And the final optimal centers obtained after convergence has been used to train RBF network. For training batch fuzzy c-means algorithm the target error has been fixed at The RBF network has been able to recognize most of the training patterns in epochs. Table 5. shows the results of applying batch fuzzy c-means algorithm on three data sets. Table 5.: Results of applying batch fuzzy c-means algorithm for three data sets Data set En-8 steel (2 input features) Grey cast iron (0 input features) En-2A steel (0 input features) Number of centers No. of Epochs Table 5.2 shows the variation of error with number of RBF units. Table 5.2: Variation of Error with number of RBF units No. of Centers or RBF units Error It is clear that the error decreases with an increase in the number of RBF units and it is minimum for 50 RBF units, beyond which the error increases. Fig. 5.4 shows the variation of error with number of epochs for 50 RBF units. 06

14 V 0.4 ^v,.,,^^^ i 0.3 ^ ^ n U ^ T ' Epoch Fig. 5.4 Variation of Error with Epochs (2-50-, En-8, Batch Fuzzy C Means) The drop in the error with epochs is gradual, when compared to that exhibited by RBF network trained using randomly initialized centers. Fig. 5.5 shows the final RBF architecture. Hidden layer Output layer ^n Fig. 5.5 RBF: Final Network architecture (2-50-) Centers Initialized Using Batch Fuzzy c-means Algorithm (b) Testing Phase: The trained network with different number of RBF Units has been tested with patterns from the training data set as well as patterns, which are not used for training. 07

15 Table 5.3 shows the results of testing both training and test data for different number of centers used in the network. Table 5.3: Performance of RBF network with selection of centers through 5 3 Batch fuzzy-c means algorithm (Input features: 2, En-8) No. RBF units Training accuracy Testing accuracy 20 96% 87% 30 94% 50 97% % It is clear from the table that as the number of RBF units increases, the performance of the network on training and test data improves till 50, beyond which there is no change in the performance. Table 5.4 shows the sample testing results for seen and unseen data for 50 RBF units. Table 5.4: Sample testing results (2-50-, En-8, batch fuzzy c-means) Pattern No. Desired output (mm) Tr Network output (mm) aining data Test data Classiflcation error XIO"'' X0-' XIO"' XIO-^ Table 5.5 shows the results for all the three data sets. 08

16 Table 5.5: Performance of RBF network using batch fuzzy c-means algorithm for center initialization for three data sets Data set En-8 steel (2 input features) Grey cast iron (0 input features) En-2A steel (0 input features) No. of RBF units Error Training accuracy 97% 85% 80% Testing accuracy 90% % The table shows that using 0 features in the input data set, there has not been much degradation in the performance of the network. Also a comparison with table 5.7 reveals that the number of centers required for the desired level of performance by the RBF network is lesser when compared to random selection of centers. In the next learning strategy, gradient descent has been used to adapt all the adjustable parameters of the RBF network Center initialization using Gradient Descent approach In this strategy, a radial basis function network has been trained using the gradient descent approach. All the adjustable parameters of the network are adapted using this approach. Karayiannis [37] proposed a supervised learning algorithm based on gradient descent for training reformulated RBF neural networks. Experiments involved a variety of reformulated RBF networks generated by linear and exponential generator functions. This indicated that gradient descent learning is simple, easily implementable and produces RBF networks that perform considerably better than conventional RBF models trained by existing algorithms. The algorithm for gradient descent learning is presented below. 09

17 Algorithm:. Select the number of centers, leaming constant r, momentum coefficient a and N (maximum no. of epochs). Randomly initialize weights to small values. The centers are selected randomly from the input data. The width of the hidden units is fixed at the begirming (constant for all the units). 2. Set Epoch=0, Etot =0 & present the first training pattern (^i, ^k)- 3. Compute the hidden layer responses Vj = e ""J'^j'"^ j. The network output is given by Ok=l/(I+e-%^j). 4. Compute the error i.e. square error, = (Ck - Ok) * (Ck - Ok) and Etot +=En (n = number of each pattern) 5. Update the adjustable parameters i.e. weights of the output layer using gradient descent method 5k = ( Ok - Ck) * Ok *(- Ok), Awkj = 5k * Vj * TI * a, W "%j = w '\j + Awkj. 6. Present the next training pattern and goto Step After a fixed number of epochs (one epoch is presentation of all the patterns in the training data set), update the center values using gradient descent approach as follows: Xj""^ = Xj"'"* + Ti * a * %, 5j * ^i where 5j = II ^i - xj II / a'j * II ^i - Xj II * e ""^ ^''''''' * Zk=i 5k * Wkj. Update the widths using P- nearest neighbor heuristic. 8. Present the first training pattern and goto Step If Etot< Emin and epoch>n, Stop. 0. Save weights, centers and widths and exit. (a) Training phase - The training of the network has been carried out on En-8 data set. The widths GJ have been kept constant at 0.9 initially for all the hidden units. The centers are adjusted for every 5000 epochs and the corresponding widths have been determined using the P- nearest neighbor heuristic, i.e., the network training started with equal widths for all the hidden units and then adjusted. This methodology has been adopted to achieve the desirable accuracy. Table 5.6 shows the variation of error with number of RBF units. 0

18 Table 5.6: Variation of Error witii Number of RBF Units No. of RBF units Error It is clear that for 60 RBF units, the error converged by the network during training is minimum. Fig. 5.6 shows the final RBF network architecture. Input layer Hidden layer Output layer ;«Fig. 5.8 RBF: Final Network architecture (2-60-) Centers initialized using Gradient Descent approach (b) Testing phase - centers. The final RBF network architecture has been tested using test samples. Table 5.7 shows the testing results for test samples for different number of Table 5.7: Performance of RBF network with initialization of centers using gradient descent No. of RBF units Error Training accuracy Testing accuracy % 73% % 87% % 87% % 80%

19 From the table it is clear that the optimal performance in terms of classification accuracy on test samples and error has been achieved for 60 RBF units. Table 5.8 shows the sample testing results for 60 centers. Table 5.8: Sample Testing Results (2-60-, Gradient Descent) Pattern No. Desired output Network output (mm) (mm) Training data Test data Classiflcation error X0"^ X0"' The network architecture has been able to recognize 9 % of the training and 87 % test patterns respectively. 5.5 Discussion Investigations have been carried out on RBF neural networks using three approaches for center initialization namely random selection from the training data, using batch fiizzy c-means algorithm and gradient descent approach. Random selection of centers from the training data set requires lot of trials for establishing the right number of centers. Where as in case of batch fuzzy c-means algorithm, the algorithm helps in determining the optimum number of centers for desirable performance. The use of batch fiizzy c-means algorithm for establishing the centers is a robust method and will always 2

20 guarantee good performance, because the membership functions used will determine the strength of attraction between the centers and the input vectors. The use of batch fuzzy c- means will definitely prevent the deterioration in the performance of the neural network due to noisy data, which is possible in a practical shop floor environment. The use of gradient descent approach adapts all the network parameters simultaneously. Though during learning, the error does not converge to a global minimum, the developed network architecture has been able to recognize most of the training and test patterns. Table 5.9 shows the comparative evaluation of fixed RBF network with three methods of center initialization and Multi-Layer Perceptron network for En-8 data set. Table 5.9: Comparative Evaluation of RBF Network and MLP (En-8) Network RBF network Network architect ure Training data Accuracy Test data No. of epochs i) Random initialization ii) Batch fuzzy-c means iii) Gradient descent MLP % 97% 9% 00 % 87% 87% MLP takes lesser training time when compared to RBF networks. The performance of both the networks are comparable, as both are robust and accurate in estimating the flank wear values. MLP requires less number of hidden units when compared to RBF networks, for the same level of performance. The generalization capability of RBF networks for three methods of center initialization has been found to be better when compared to MLP. The RBF networks are able to generalize well based on the method adopted for center initialization. The developed network architecture has been able to classify almost all of the seen patterns. The network has been able to classify 93 % of the unseen patterns 3

21 for random initialization and batch fuzzy c-means algorithm and 87 % of the unseen patterns for gradient descent approach, indicating that RBF neural networks are powerfiil architecture for pattern classification. 5.6 Conclusion: In this chapter,. RBF neural networks have been applied for tool wear monitoring for determining the tool status in face milling operations. 2. Three center initialization strategies for the RBF units in the hidden layer have been investigated. 3. For each center initialization approach and width selection a number of experiments have been conducted to train the network and the learning characteristics have been thoroughly analyzed. 4. RBF neural networks have been found to exhibit better learning and generalization abilities in estimating the flank wear values, for any of the center initialization strategies, when compared to MLP. 5. RBF neural networks have been effective in monitoring the condition of the tool during face milling operations with 93 % accuracy, when unseen signal patterns are presented to the network. In the next chapter, application of Resource Allocation Network for tool wear monitoring is presented. 4

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer