TRAINING SIMULTANEOUS RECURRENT NEURAL NETWORK WITH RESILIENT PROPAGATION FOR STATIC OPTIMIZATION

Size: px

Start display at page:

Download "TRAINING SIMULTANEOUS RECURRENT NEURAL NETWORK WITH RESILIENT PROPAGATION FOR STATIC OPTIMIZATION"

Ashlynn Weaver
5 years ago
Views:

1 TRAIIG SIMULTAEOUS RECURRET EURAL ETWORK WITH RESILIET PROPAGATIO FOR STATIC OPTIMIZATIO Gursel Serpen * and Joel Corra Electrical Engineering and Computer Science, The University of Toledo, Toledo, OH USA ABSTRACT This paper proposes a non-recurrent training algorithm, resilient propagation, for the Simultaneous Recurrent eural network operating in relaxation-mode for computing high quality solutions of static optimiation problems. Implementation details related to adaptation of the recurrent neural network weights through the non-recurrent training algorithm, resilient backpropagation, are formulated through an algebraic approach. Performance of the proposed neuro-optimier on a well-known static combinatorial optimiation problem, the Traveling Salesman problem, is evaluated on the basis of computational complexity measures and, subsequently, compared to performance of the Simultaneous Recurrent eural network trained with the standard backpropagation, and recurrent backpropagation for the same static optimiation problem. Simulation results indicate that the Simultaneous Recurrent eural network trained with the resilient backpropagation algorithm is able to locate superior quality solutions through comparable amount of computational effort for the Traveling Salesman problem. Keywords recurrent neural network, resilient propagation, traveling salesman problem, neural network training, static optimiation, nonlinear neural network dynamics, backpropagation, recurrent backpropagation. * Corresponding author: phone (49) , fax: (49) , and gserpen@eng.utoledo.edu

2 Simultaneous Recurrent eural etwork. By Serpen and Corra Introduction The Simultaneous Recurrent eural network (SR) has been recently proposed as a superior algorithm among the family of neural optimiers to address large-scale static optimiation problems particularly ones from the domain of combinatorial optimiation [Serpen et. al., 200; Werbos and Pang, 996]. The SR is a trainable algorithm when configured in relaxation mode to function as a static optimier. The property of trainability for the SR appears to provide significantly superior capability to more efficiently focus in the promising regions of the overall search space of a given problem, and hence, leading to dramatic performance improvements with respect to solution quality measures compared to other neural algorithms including those in the family of Hopfield networks [Hopfield and Tank, 985] and its derivatives, such as the Boltmann Machine, Mean Field Annealing network, and others [Cichocki and Unbehauen, 993]. Earlier studies using the SR, trained with the recurrent backpropagation algorithm, to solve large-scale static optimiation problems such as the Traveling Salesman Problem have confirmed the performance of the SR to be notable in comparison with the Hopfield network and its stochastic derivatives [Patwardhan, 999; Serpen et. al., 200]. Ability of the SR to compute solutions for instances of large-scale static optimiation problems for which Hopfield family neural algorithms fail to locate solutions strongly motivates consideration of a multitude of approaches for training. In an earlier study, standard backpropagation training algorithm was adapted for the SR dynamics. An extensive computational complexity study to assess the promise of training the SR with the standard backpropagation (BP) was performed using the TSP problem [Serpen and Xu, 200]. This study demonstrated that it is feasible to train the SR with the BP and that quality of solutions computed through this approach was higher compared Page 2 of 38

3 Simultaneous Recurrent eural etwork. By Serpen and Corra to quality of what the SR trained with RBP was able to compute. However, the computational cost incurred through the BP training was substantially higher as well, particularly for large-scale instances of the problem, than the cost incurred due to the RBP training. Therefore, a computationally efficient version of the BP training algorithm, the resilient propagation (RPROP), is poised to offer high quality solutions at a comparably lower computational cost, which forms the thrust for the research work presented in this paper. This paper proposes to train the Simultaneous Recurrent eural network using the resilient propagation algorithm for combinatorial optimiation to achieve higher quality solutions at a similar computational cost compared to the SR trained with the recurrent backpropagation. Simultaneous Recurrent eural etwork A Simultaneous Recurrent eural network (SR) is an artificial neural network [Werbos et al., 997] with the graphical representation illustrated in Figure. External Input Feedforward Mapping External Output Propagation Path with Identity Mapping Figure. SR Graphical Representation. The system has external inputs in the form of a vector x, a feed-forward function f ( ) (any feedforward mapping including the multi-layer perceptron network is appropriate), outputs in the Page 3 of 38

4 Simultaneous Recurrent eural etwork. By Serpen and Corra form of a vector, and a feedback path. The feedback path copies the outputs to inputs with a time delay strictly due to propagation of signals. The feed-forward network f ( ) will also induce a weight matrix W, which represents the interconnection topology of the network. The network, starting from an initial state as indicated by initial value of the output vector, will iterate until the output vector stabilies and converges to a stable point given that one exists. In other terms, an SR is a feed-forward network with feedback from outputs of the network to its inputs. An SR exhibits complex temporal behavior: it follows a trajectory in the output space to relax to a fixed point. One relaxation of the network consists of one or more iterations of output computation and propagation along the feed-forward and feedback paths until the outputs converge to stable values. A mathematical characteriation of the computation performed by an SR is given by ( t) f[ ( t ξ ) x( t ξ ) W] =,,, x where is the external output, x is the external input, W is the network weight matrix, t is the time, ξ and ξ x are the time delays associated with propagation of outputs through the feedback and forward paths, and of inputs through the forward path, respectively. Assuming presence of one or more stable equilibrium points in the output space of the network dynamics, following a relaxation cycle of the network dynamics, convergence to a fixed-point for sufficiently large t will occur. Once the network dynamics converge to a fixed point, the delayed value of outputs will be the same as the current value of outputs: ( ) = ( t ξ ) for t and t Page 4 of 38

5 Simultaneous Recurrent eural etwork. By Serpen and Corra therefore, value of outputs following convergence can be conveniently represented by ( ). Upon convergence to a fixed-point, the form of equation modeling the network computation becomes ( ) = f[ x( t ξ ) W],., x The network is provided with external inputs and initial outputs, which are typically assumed randomly in the absence of a priori information. The output delayed through the feedback and forward paths along with the external inputs delayed through the forward path are utilied to compute the new output. The network dynamics is allowed to relax until it reaches a stable equilibrium point assuming one exists. External inputs are applied throughout the complete relaxation cycle. When a stable equilibrium point is reached, the outputs stop changing appreciably. It is important to distinguish the propagation delay associated with the feedback path of the SR from the sampling delay existing in the feedback paths of Time Delay eural etworks (TD): the assumption is that the signal propagation delay in the feedback path of an SR is typically much smaller than the sampling delays found in TDs. The SR can function in two distinct modes: as a continuous mapper when the external inputs x are varying with respect to time, and as an associative memory when the external inputs x are constant over time. The SR operates as a relaxation-type recurrent network while in associative memory mode, which is typically leveraged to address static optimiation. Page 5 of 38

6 Simultaneous Recurrent eural etwork. By Serpen and Corra Once the SR network converges to a fixed point (assuming associative memory mode of operation), the recurrent inputs are constant over time. This observation reduces the SR to a feedforward topology subject to special constraints on its inputs. In this case, in addition to the external input x, the feedback is considered to be constant over time once the network has converged to a fixed point. Specifically, the recurrent network is considered as a feedforward network subject to the constraint that ( ) = f [( ), x, W], () where ( ) is the relaxed value of the output of the network with the topology given in Figure 2. Input x Output ( ) Feedforward etwork f [( ), x, W] Output ( ) Figure 2. The Simultaneous Recurrent eural etwork upon Convergence to A Fixed Point. The significance of this observation relating to the simplification of the SR dynamics is that it allows weight training algorithms based on standard (non-recurrent) backpropagation (BP) to be employed rather than the computationally expensive recurrent backpropagation, which requires the setup and relaxation of an adjoint network with significant computational cost [Werbos, 988, Almeida, 987, Pineda, 987]. Page 6 of 38

7 Simultaneous Recurrent eural etwork. By Serpen and Corra A minimal topology for the SR that can address static combinatorial optimiation problems is achieved by assuming a three-layer continuous-dynamics perceptron network for the feedforward mapping in Figure with one hidden layer and no external input. As depicted in Figure 3(a), the SR has three layers whereby the nodes in input layer simply distribute the incoming signals through weighted connections to nodes in the hidden layer without any form of processing. The computational structure in Figure 3 (a) is realied for K, J, and K nodes in input layer, hidden layer, and output layer, respectively. In Figures 3 (a) and (b), y and represent output of the hidden layer and the external output, whereas U and V are the J K and K J weight matrices associated with forward and backward signal propagations, respectively. Figure 3 (b) shows the minimal SR topology for static optimiation, which is identical to the topology in Figure 3 (a) following a transformation to eliminate redundancy. U ( ) Input V ( ) Hidden U y Output ( ) Hidden ( ) V y Output (a) Minimal SR Topology (b) Transformed Minimal SR Topology Figure 3. Minimal SR Topology for Static Optimiation Problems. ode dynamics for the output layer of SR are represented by ds dt k = s k + J j= u kj y j and = f s ) for k =,2,,, (2) k ( k Page 7 of 38

8 Simultaneous Recurrent eural etwork. By Serpen and Corra where y j is the output of j-th node in hidden layer, J is the node count in hidden layer, u kj is the forward weight from j-th node in hidden layer to k-th node in output layer, is the dimensions of the output array, and f is a continuous, differentiable function typically a sigmoid. Similarly, for a node y j in the hidden layer, the dynamics is defined by ds dt j = s j + k= v jk k and y = f s ) for j =,2,, J, (3) j ( j where k is the output of k-th node in output layer, J is the node count in hidden layer, v jk is the backward weight from k-th node in output layer to j-th node hidden layer, is the dimensions of the output array, and f is a continuous and differentiable function, typically a sigmoid. Traveling Salesman Problem and SR Topology The Traveling Salesman Problem (TSP) was chosen for the performance assessment of the SR because it belongs to the class of P-hard optimiation problems and previous neuro-optimiers failed to compute high quality solutions within acceptable computational cost bounds [Serpen et al., 200]. For a solution of the TSP to be valid, it is required that each city be visited once and exactly once, with the goal of minimiing the total distance traveled. In the application of the SR to solve the TSP, the set of potential solutions for an -city TSP is represented by an node array realied through the output layer of the SR. Each row of the array represents a different city and each column represents a possible position of that city in the path. The Page 8 of 38

9 Simultaneous Recurrent eural etwork. By Serpen and Corra network outputs should provide a clear indication of which city is selected as the travel path is decoded from the SR outputs. Thus, it is desirable for node outputs in the output layer to be as close as possible to limiting values of 0.0 and.0 for the choice of unipolar sigmoid activation function for the nodes. The SR topology proposed for the TSP is a two layer recurrent network as in Figure 4. The output layer consists of an array of nodes, which represent the solution of the TSP in the form of a two dimensional array. The SR has a single hidden layer containing J nodes. Each node in the network has an associated weight vector: u = [ u u u ] T k k k 2 kj and v j = [ v v v ] T j j 2 jk represent the weight vectors of the k-th node in the output layer, where k=,2,,k with K=, and the j-th node in the hidden layer, respectively. The distance between cities is represented by an symmetric matrix of real numbers randomly specified in the interval [0.0,.0] using a uniform probability distribution at the outset of the simulation. Each entry of the cost matrix represents the distance between two cities identified by the row and column indices of that entry. There is no external input to the SR; consequently, the SR has a two-layer topology typically with a small number of hidden nodes. Page 9 of 38

10 Simultaneous Recurrent eural etwork. By Serpen and Corra v j v j = [ v v v ] T j j 2 jk u k = [ u u u ] T k k 2 kj y j v j2 u k2 u k k node j v jk u kj node k J-ode Hidden Layer Output Layer Figure 4. Architecture of the SR for the TSP. A suitable error function to map the TSP to the SR topology is given in Equation 4 [Serpen et al., 200] E = g col i= j= m= + g mj g row in + gbin[ [ ij 0.5] ] dis i= j= m= ij i= j= n= m( j+ ) i= j= d, (4) where g col, g row, g bin, and g dis are positive real weight parameters, d im is the distance between city i and city m, and is the number of cities. The first and second double summations enforce the constraint that each column and row has exactly one node active in the solution, respectively. The third double summation encourages node outputs to approach limiting values of 0.0 or.0 rather than a value in between. The last summation term seeks to minimie the total travel im distance for a solution. Details on formulation of this error function is presented in the Appendix. Page 0 of 38

11 Simultaneous Recurrent eural etwork. By Serpen and Corra RPROP Algorithm for SR Training Resilient propagation (RPROP) is an efficient learning scheme that directly adjusts the weights of the network based on local gradient information [Riedmiller and Braun, 993]. Consider the SR upon convergence to a fixed point as in Figure 3 (a). The weight update rule for the RPROP training algorithm is given by [Riedmiller and Braun, 993; Cichocki and Unbehauen, 993] w ( t ij d ) ( t ( t ) E d ij sgn = w ( td ) σ w, ij d ij ), if ( t E w d ij ) otherwise ( td E w ) ij 0 (5) where w is the value of a weight between j-th and i-th nodes in any two consecutive layers at ( t d ) ij discrete iteration given by t d, sgn( x ) = 0 x < 0 x = 0, and x > 0 0 σ. ote that when the error partial changes sign in two consecutive steps suggesting that the previous update value was too large and consequently the minimum was missed, the corresponding weight is updated in a direction to return the weight value close to its value prior to the previous update. In other words, the previous weight update value is subtracted from the corresponding weight following a possible scaling through the parameter σ. The learning rate is adapted through the following procedure: Page of 38

12 Simultaneous Recurrent eural etwork. By Serpen and Corra ( td ) ij min = max ( t ij + ( td ) { η, η }, ( td ) { η, η }, d ), ij ij max min ( td E if wij ( td E if w ij otherwise ) ) ( td E w ( td E w ) ij ) ij > 0 < 0, (6) where < η < < η + 0, η min 0 6, and η max 0 2, while noting that values of these parameters are empirically determined. The RPROP training algorithm does not utilie the magnitude of the partial derivative of error for weight updates: it monitors the sign of partial of error with respect to each adaptable weight in the network and modifies weights using values computed through Equation 6. A high level explanation of the adaptation rule is presented in Figure 5.. Compute two consecutive values of partial derivatives of error with respect to each adaptable weight. If a sign change occurs indicating that the local minimum was missed, the update value is decreased. Else no sign change occurs indicating that gradient descent is in progress, the update value for the weight is increased. 2. Check derivative of the error function with respect to each adaptable weight: If positive indicating that the error is increasing, subtract the update value from the weight value. Else negative indicating that the error is decreasing, add the update value to the weight value. 3. Repeat steps and 2 until convergence. Figure 5. Pseudo code for Resilient Propagation Training Algorithm Page 2 of 38

13 Simultaneous Recurrent eural etwork. By Serpen and Corra The RPROP algorithm requires partial derivative of error function with respect to each and every adaptable weight in the network to be computed in accordance with Equation 6. The partial derivative of error function with respect to the weight u kj between the j-th hidden layer node and k-th output layer node with j=,2,,j and k=,2,,, derivation of which is detailed and given by Equation A-2 of Appendix, is as follows: E kj k= ( q ) + r = 2g 2g col q= r= m= bin q= r= ( 0.5) qr mr qr qr kj kj + 2g + g row q= r= n= dis q= r= m= d qm qn m( r+ ) qr qr kj kj (7) where qr kj specified in Equation 2. term is readily computable given the node dynamics in the output layer as Partial derivative of error function with respect to the weights belonging to the hidden layer nodes is defined as [Zurada, 992] E v jk = f k= v jk k k k = δ u, (8) ok kj where k=,2,, and j=,2,,j. ote that J E E k E δ = = ok = f ukj y j, (9) netk k sk k j= Page 3 of 38

14 Simultaneous Recurrent eural etwork. By Serpen and Corra where f is the derivative of activation function f, and y j is the output of the j th node in the hidden layer. The term E k is the derivative of the error function with respect to the k-th node output in the output layer and is given by Equation A-3 in the Appendix: E k k= ( q ) + r = 2g col q= r= m= mr 2g + 2g bin q= r= row q= r= n= ( 0.5) + g d qr qn dis q= r= m= qm m( r+ ), (0) where k=,2,, and is the number of cities for the TSP. Simulation Study The simulation study was formulated to investigate the feasibility of training the SR configured as a static optimier with the resilient propagation training algorithm. The study aimed to assess the computational cost and, accordingly, to perform a computational cost comparison among the SRs trained with resilient propagation, standard backpropagation, and recurrent backpropagation training algorithms. In order to facilitate a fair comparison among the three training algorithms, the simulation software and the computing platform utilied complied with the following requirements. The simulation code was written in C/C++ to run in UIX operating system environment. The same template C/C++ based code, which was the basis for standard backpropagation as well as the recurrent backpropagation implementations, was utilied to derive the resilient propagation implementation of the training algorithm for the SR. The computing platform utilied for simulations was a Sun Sparc workstation with two 300 MH Page 4 of 38

15 Simultaneous Recurrent eural etwork. By Serpen and Corra CPUs and.2 GB main memory, which was also employed to run the code for the other two training algorithms [Serpen et al., 200; Serpen and Xu, 200] Setup and Initialiation for TSP A number of parameters need to be set and initial values determined in order for the SR to be properly configured to solve the TSP. There is no external input to the SR; thus, the SR has a two-layer topology: one hidden layer and one output layer. Specification of the number of nodes in the hidden layer is highly critical since there are significant consequences in terms of the SR to be able to simply locate a solution, the computational complexity of the SR algorithm itself, and total computational cost incurred for finding a locally optimal solution as measured by the number of iterations needed. A preliminary empirical study demonstrated that the node count in the hidden layer had a drastic effect on the number of iterations needed for the SR to converge a solution. When experimenting with the number of hidden layer nodes to employ, it was determined that increasing the number of hidden nodes tended to considerably decrease the time for the SR network to find a solution. On the other hand, simply doubling the number of hidden layer nodes, the number of weights from the hidden layer to output layer and from the output layer to hidden layer also double, which doubles the memory requirements. This in turn increases the number of calculations required for each iteration leading to a potentially significant performance degradation. The choice of eight hidden layer nodes was determined as a reasonable compromise on the computing platform utilied while keeping the memory and computational requirements within practical bounds. Page 5 of 38

16 Simultaneous Recurrent eural etwork. By Serpen and Corra The network node outputs were randomly initialied using a uniform probability distribution. The initial values of outputs of all the nodes in both layers were set to values in the range 0.0 to.0. The weights were randomly initialied to a value in the interval [-0.2, 0.2] with a uniform probability distribution as well. The same activation function was used for all the nodes in the network where the slope or steepness was determined as.0 and the limits were 0.0 and.0 for a unipolar specification. odes were considered as active or inactive for threshold values of 0.8 and 0.2, respectively. An matrix represents the distances between cities. The row index represents the starting city and the column index represents the destination city; hence, each entry of the matrix is the distance between the two cities represented by the indices of that entry. The distance matrix is symmetric and each entry is randomly set to a value between 0.0 and.0 using a uniform probability distribution. According to this distance matrix, the expected normalied distance (obtained by dividing the total path length by the number of cities) for a randomly chosen path is 0.5. The error function that maps the TSP to the SR topology has a number of parameters that are employed to reflect the significance of each individual constraint. Initial values of these constraint weighting coefficients need to specified to facilitate the simulation study. Since currently no theoretical bounds on the values of these coefficients are known to exist, the only option is to resort to empirical means. Page 6 of 38

17 Simultaneous Recurrent eural etwork. By Serpen and Corra It was empirically determined that the precise values of these parameters had little effect on the performance of the SR network. However, the same empirical analysis showed that the ratio between these parameter values was very important. Excessive emphasis placed on the row and column error terms by specifying much larger values for constraint weight parameters associated with these two constraints will lead to a solution quicy, but the tour length is highly likely to be away from the optimal value. A small tour length may be achieved by emphasiing the distance term through a relatively larger value for its constraint weight parameter, which would be at the expense of mostly converging to mostly invalid solutions: either the row or column or both constraints are violated. Monitoring each individual error term demonstrated that values of the normalied row and column error terms were noticeably larger than the value of the distance error term during the initial phases of training. In order to keep the row and column error terms from dominating the total error, the distance weight parameter value was initialied to be approximately three times larger than value of other constraint weight parameters. Values of the constraint weight parameters in Table provided a good starting point for the SR trained using recurrent backpropagation as reported in an earlier study [Serpen, et. al., 200]. Therefore, the weight parameters were initialied to the values shown in Table for the SR trained using the resilient propagation training algorithm as well. Furthermore, the need to change values of these parameters during the course of training also emerged as a result of a preliminary empirical analysis. More specifically, constraint weight parameters needed to be increased following a set schedule, which turned out to be critical for the SR to locate a solution, which concurs with the findings of previous studies [Serpen, et. al., 200, Serpen and Page 7 of 38

18 Simultaneous Recurrent eural etwork. By Serpen and Corra Xu, 200]. In order to encourage the SR algorithm to locate higher quality solutions, the increment value of the constraint weight parameter for the distance constraint was specified as large as possible compared to values for the remaining parameters while ensuring that the network converges to valid solutions. Incrementing the parameters every five iterations appeared to provide a reasonable choice for quick convergence towards a solution: an earlier study notes the difficulty in convergence to a solution if these parameters are updated more frequently [Serpen et al., 200]. Hence, weight parameters were incremented after every fifth iteration according to the values given in Table. Parameter Initial Value Increment g col g row g bin g dis Table. Constraint Weight Parameter Initial and Increment Values. In order to make an accurate comparison among resilient propagation, standard backpropagation, and recurrent backpropagation algorithms, assessments are made as to the quality of solutions, the number of iterations, and the computational cost measured by the computation time. The quality of the solution is measured by computing the normalied average distance (AD) between any two cities in the solution path the SR locates, which is defined by AD = i= j= m= ij m( j+ ) d im, () Page 8 of 38

19 Simultaneous Recurrent eural etwork. By Serpen and Corra where d im is the distance between city i and city m and is the number of cities. The computing time is determined by using the UIX operating system utility timex, which returns the processor time expended to complete a process. Stopping criterion for the training of the network is another critical issue. One standard approach is to stop training as soon as a valid solution is found. For a gradient descent-based search algorithm, this solution is guaranteed to be at least locally optimal. This criterion, which was employed for training the SR with standard backpropagation and recurrent backpropagation to address static combinatorial optimiation problems [Patwardhan, 999; Serpen et. al., 200], was employed for simulations implemented using the resilient propagation training algorithm as well in order to establish compatibility for performance comparison purposes. It is worthwhile to note that training beyond this point is likely to result in higher quality solutions while noting inherent limitations associated with a gradient search algorithm like the resilient propagation. Another important consideration is the convergence criterion, which determines the conclusion of a relaxation for the SR dynamics. This determination is facilitated by deciding when to declare that the SR dynamics have converged to a stable equilibrium point using the inequality given by: k t < ε for all k with k =,2,,K, Page 9 of 38

20 Simultaneous Recurrent eural etwork. By Serpen and Corra where k is the k-th node of the output layer, K is the number of nodes in the output layer, and + ε R as ε 0. Practical implementation of this condition can be realied through the following inequality: K K k= k t < 0.0, which establishes a sufficiently small upper bound for convergence to a fixed point for large values of K, which is the number of nodes in the output layer. The learning rate is an important factor that affects the ability of the RPROP algorithm to locate a solution. If the learning rate is too large, oscillation may occur as the network is approaching a solution. On the other extreme, if the learning rate is too small, the network will take longer to reach a solution due to the slow learning speed. To evaluate the effect of learning rate, different values for both the positive and negative learning rates were tested using a problem sie of 00 cities. The values tested for the positive learning rate were.2,.3,.5,.7,.9, 2.0, 2.2, and 2.4. The values tested for the negative learning rate included 0.4, 0.5, and 0.6. Values of parameters η min, η max, and σ were specified as ,.0, and.0, respectively. Table 2 shows the results as the average of ten simulations run for each learning rate. Changing the learning rate appears to have little distinguishable effect on the quality of solution. However, a higher positive learning rate does accelerate convergence and decreases the average CPU time and average number of iterations required to reach a solution. The negative learning rate does not seem to have a significant effect on performance, although the value of 0.6 did show a slightly worse performance than the other two values. A positive learning rate of 2.0 seems to Page 20 of 38

21 Simultaneous Recurrent eural etwork. By Serpen and Corra have the best combination of results for average CPU time and normalied average distance, and a negative learning rate of 0.4 had the best results for normalied average distance. In light of these findings the positive and negative learning rates were specified as 2.0 and 0.4, respectively for the simulation study. Learning Rates {η +, η - } ormalied Distance Average umber of Iterations Average CPU Time (minutes:seconds) {.2, 0.5} :7.69 {.3, 0.5} :25.38 {.5, 0.5} :22.90 {.7, 0.5} :8.38 {.9, 0.5} :30.6 {2.0, 0.5} :4.3 {2.2, 0.5} :7.05 {2.4, 0.5} :.98 {2.0, 0.4} :08.96 {2.0, 0.6} :50.6 Table 2. Averaged Value of Results for 0 Trials at Various Learning Rates for the 00-City TSP Simulations for City Counts in the Range 0 to 00 First simulation study for comparative performance assessment was implemented for problem sies of 0, 30, 50, 70, and 00 cities. A total of 0 simulations were run for each problem sie in order to ensure the accuracy of the measured values. The normalied average distance, number of iterations, and CPU time expended were recorded for each simulation run. For each problem sie, the results from all 0 simulations run using the resilient propagation (RPROP) training algorithm are given, as well as the average value, which is then compared against the best result in terms of each measure calculated (normalied average distance, number of Page 2 of 38

22 Simultaneous Recurrent eural etwork. By Serpen and Corra iterations, and CPU time expended) for each case from the simulations run using standard backpropagation (BP) and recurrent backpropagation (RBP) [Serpen and Xu, 200]. Table 3 shows the results for a problem sie of 0 cities. Although the quality of solutions located by the RPROP is better than that for a randomly chosen path, it is not as good as those found with either of BP or RBP. Also of interest is the drastic difference between the numbers of iterations required to find a solution, which implies training with the BP requires considerably more calculations than training with the RPROP. ote that due to incomparability of recurrent backpropagation algorithm on the basis of number of iterations with other two training algorithms, the corresponding slot in Table 3 and similar other tables is left blank. RPROP Trial o. ormalied CPU Time umber of Distance (seconds) Iterations RPROP (average) BP (best) RBP (best) Table 3. Simulation Results for the 0-City TSP. The results for a problem sie of 30 cities are shown in Table 4. The quality of solutions computed by the RPROP surpasses those computed by either the BP or the RBP. The RPROP Page 22 of 38

23 Simultaneous Recurrent eural etwork. By Serpen and Corra performance also exceeds the other two on the basis of the CPU time and the number of iterations required to reach a valid solution as applicable. RPROP Trial o. ormalied CPU Time umber of Distance (minutes:seconds) Iterations RPROP (average) : BP (best) : RBP (best) 0.20 :30.00 Table 4. Simulation Results for the 30-City TSP. The results for the problem sie of 50 cities are given in Table 5. The RPROP performs markedly better in terms of locating higher quality solutions compared to other two training algorithms while also requiring much less CPU time. Table 6 shows the comparative performance results as the problem sie is increased to 70 cities. The quality of solutions continues to enhance for the RPROP, and the disparity between the results from training with the RPROP and the results from training with either of the BP or the RBP continues to grow as well in all three measured categories. Both the number of iterations and the CPU time are significantly less than the corresponding values when training the SR with the standard BP or the RBP. Page 23 of 38

24 Simultaneous Recurrent eural etwork. By Serpen and Corra RPROP Trial o. ormalied CPU Time umber of Distance (minutes:seconds) Iterations : RPROP (average) : BP (best) 0.24 : RBP (best) :56.00 Table 5. Simulation Results for the 50-City TSP. RPROP Trial o. ormalied CPU Time umber of Distance (minutes:seconds) Iterations RPROP (average) : BP (best) : RBP (best) :44.00 Table 6. Simulation Results for the 70-City TSP. Page 24 of 38

25 Simultaneous Recurrent eural etwork. By Serpen and Corra Table 7 shows the results when the problem sie is increased to 00 cities. The RPROP algorithm performs notably better in all three measured categories compared to other two algorithms. RPROP Trial o. ormalied CPU Time umber of Distance (minutes:seconds) Iterations : : : : : : : : : : RPROP (average) : BP (best) : RBP (best) :2.00 Table 7. Simulation Results for the 00-City TSP. Figures 7 and 8 are graphical comparisons of, respectively, the quality of solutions, and the CPU time to reach a solution for the SR trained with the RPROP against the SR trained with standard backpropagation and recurrent backpropagation. The values used in each graph are the average of 0 simulations for each problem sie for the SR trained with the RPROP compared against the best results for the same problem sie for the SR trained with standard backpropagation and recurrent backpropagation. Simulation results clearly show that, with the exception of the initial problem sie of 0 cities, the SR trained with the RPROP algorithm produces solutions that are considerably superior in every measurable category compared to the Page 25 of 38

26 Simultaneous Recurrent eural etwork. By Serpen and Corra solutions produced using the SR trained with standard backpropagation, and recurrent backpropagation, and that this disparity broadens as the problem sie increases ormalied Average Distance RPROP RBP BP TSP City Count Figure 8. Comparison of Quality of Solutions for RPROP, BP, and RBP. It is further relevant to note the trends of the curves in Figures 8 and 9. All three training algorithms generate curves with slightly negative slope, which is more pronounced in the case of RPROP, as the problem sie is increased, which is a very important property of the SR to address large-scale static optimiation problems. The BP training time curve appears to follow an exponential growth trend while the RPROP demonstrates modest growth as the problem sie is increased. Page 26 of 38

27 Simultaneous Recurrent eural etwork. By Serpen and Corra 3500 CPU Time (in seconds) RPROP RBP BP TSP City Count Figure 9. Comparison of the CPU Time for RPROP, BP, and RBP. Simulation Results for City Counts in the Range 200 to 500 Performance of the RPROP algorithm was also tested on 200, 300, 400, and 500 city counts for the TSP. Results are presented in Table 8, where the values reported for the RPROP are average of 0 trial runs. The BP algorithm was not attempted for these problem sies due to its prohibitively high computation cost as reported in [Serpen and Xu, 200]. Table 8 also depicts performance results for the RBP algorithm for the same TSP city counts. Solutions were reached following a search duration of typically on the order of hours for these problems sies: i.e., for problem sies of 300 and 500 cities, CPU times were 2 hours 26 minutes 9 seconds and 2 hours 48 minutes 44 seconds, respectively. The quality of solutions computed by the RPROP algorithm is significantly superior than those of computed by the RBP algorithm although the CPU times are relatively close to each other. In conclusion, the RPROP training algorithm proved to be more scalable in terms of its ability to locate higher quality solutions than either of standard backpropagation or recurrent backpropagation as the problem sie grew. Page 27 of 38

28 Simultaneous Recurrent eural etwork. By Serpen and Corra ormalied Distance CPU Time (hours:minutes) TSP City Count RPROP RBP RPROP RBP :55 02: :26 0: :23 7: :48 25:35 Table 8. SR Performance for Large Sie TSPs. Quality of Solutions Computed by SR Quality of solutions computed by the SR can be precisely assessed through comparison with the optimal value of the solution in terms of minimum tour length. The type of TSP problem addressed in this study has the following properties: it has random distance matrix, placement of cities is non-euclidean, and inter-city distances are symmetric. A recent report on the state-ofthe-art for heuristics-based approaches on the TSP [Johnson and McGeoch, 2002] discusses a number of instances of the TSP where the distance matrix entries are generated using random integers independently drawn from a uniform distribution in the range [, ]. Furthermore, the distance matrix was symmetric and city placement obeyed a non-euclidean geometry and therefore, these instances are comparable to the cases presented in our study. Johnson et al., reported four instances of the 000-city, two instances of the 3,62-city, and one instance of the 0000-city TSPs. The normalied inter-city distance for optimal solutions as well as the Held-Karp lower bounds [Held and Karp, 970 & 97] were computed by the Concorde package [Applegate et al., 998] and are presented in Table 9 as reported in [DIMACS, 2002]. Page 28 of 38

29 Simultaneous Recurrent eural etwork. By Serpen and Corra TSP Instances Held-Karp Optimal ormalied Bound Value Distance 000-City Instance City Instance City Instance City Instance City Instance City Instance City Instance Table 9. Optimal Value and Held-Karp Bounds on Random Distance TSP Instances. Results for optimal value computations for various city sies in Table 9 indicate that it is reasonable to conclude the optimal tour length is approximately slightly more than 2,000,000. oting that different bounds for the uniform interval were employed in our study, optimal value of the tour length maps to approximately 2.0. This value of the optimal tour length suggests that the normalied inter-city distance for the 00 city problem is approximately 0.02 (normalied inter-city distance in optimal tour length = optimal tour length / number of cities) for the case where intercity distances are drawn from the interval [0.0,.0]. The SR computed solutions with a normalied inter-city distance of approximately 0.5 for the 00-city TSP instance, which is still subject to improvement considering the optimal value of There is significant potential for improvement though, since the search algorithm implemented by the SR is based on gradient descent, which is a greedy heuristic bound to be trapped by local minima and hence lacks the computational features to be able to locate high quality solutions. Enhancement of the SR search algorithm with a stochastic mechanism would provide the needed leverage in order to improve its ability to compute much higher quality solutions. Page 29 of 38

30 Simultaneous Recurrent eural etwork. By Serpen and Corra The SR appears to further demonstrate an intrinsic feature to scale, in a limited capacity, with increases in the TSP instance sie. As can be noted from the Figure 8, the solution quality curve tends to lower values as the TSP instance sie increases. Along the same lines, results in Table 8 also suggest that the quality of solution improves, albeit slightly, as the TSP instance sie increases: the SR computed a tour with normalied intercity distance value of 0. for the 500- city TSP. The real computational promise of neural optimiation paradigms lies in the context where realtime computing requirements could only be satisfied by massively parallel hardware realiations of the search algorithm and computing time would need to be independent of the sie of the problem. The neural algorithms, if implemented in hardware, facilitates trading the time complexity with the space (hardware) complexity leading to constant computing times for any sie of a given optimiation problem. Otherwise, existing state-of-the-art algorithms employing heuristics clearly dominate in terms of their ability to compute near-optimal solutions for the TSP for up to,000,000-city TSP instances [Johnson and McGeoch, 997]. Effect of Extended Training on Solution Quality As noted earlier, it was conjectured that training the SR beyond the point at which a locally optimal solution is found could lead to improved solution quality. In order to empirically test this conjecture, quality of solutions computed by the SR network for the 00-city TSP instance as a function of number of iterations was studied. Simulation results are presented in Figure 9. Page 30 of 38

31 Simultaneous Recurrent eural etwork. By Serpen and Corra 0.5 ormalied Average Distance umber of Iterations Figure 9. Solution Quality vs. Training Iterations for 00-City TSP Results presented in Figure 9 imply that the solution quality tends to increase as the SR network is trained well beyond the point where the initial locally optimal solution is located. However, the degree of improvement might not be significant given the cost incurred due to additional computations needed. It is conceivable that a stochastic search mechanism might be a better choice and offer more distinct improvement in solution quality at a comparable computational cost. Conclusions Effectiveness of training a Simultaneous Recurrent eural network with the resilient propagation algorithm was compared against training the same network with standard backpropagation as well as recurrent backpropagation algorithm to explore the feasibility and the promise of greater computational efficiency while maintaining or possibly improving the quality of solutions Page 3 of 38

32 Simultaneous Recurrent eural etwork. By Serpen and Corra produced. The performance comparison was implemented through application of the SR to solve a static P-hard combinatorial optimiation problem, the Traveling Salesman problem. Although the resilient propagation was initially considered for its promise to reduce the computational cost associated with searching for a solution of a large-scale static optimiation problem, simulation study showed that resilient propagation offers much higher quality solutions at a similar computational cost when compared with the RBP. This finding is noteworthy due to the fact that improving quality of solutions would typically lead towards stochastic versions of the recurrent backpropagation training algorithm at a significantly greater computational cost. Acknowledgements Funding provided through United States ational Science Foundation (SF) Grant o and The University of Toledo 200 URAFP Grant for this research project is gratefully acknowledged. Findings, conclusions, and opinions expressed are authors and do not reflect those of either the SF or the University of Toledo. Bibliography Almeida, L. B., A Learning Rule for Asynchronous Perceptrons with Feedback in a Combinatorial Environment, Proceeding of IEEE st International Conference on eural etworks, San Diego, CA, pp , 987. Applegate, D., Bixby, R. E., Chvatal, V. and Cook, W., On the Solution of Traveling Salesman Problems, Documenta Mathematica, Extra Volume ICM III: , 998. Concorde code available at Cichocki, A. and Unbehauen, R., eural etwork for Optimiation and Signal Processing, Wiley, 993. DIMACS 200 Symmetric TSP Challenge Results reported at and cited on August 0 th, Held, M. and Karp, R. M., The Traveling Salesman Problem and Minimal Spanning Trees, Operations Research, 8:38-62, 970. Page 32 of 38

33 Simultaneous Recurrent eural etwork. By Serpen and Corra Held, M. and Karp, R. M., The Traveling Salesman Problem and Minimum Spanning Trees: Part II, Mathematical Programming, :6-25, 97. Hopfield, J. J., and Tank, D. W., eural Computation of Decision in Optimiation Problems, Biological Cybernetics, Vol. 52, pp. 4-52, 985. Johnson, D. S. and McGeoch, L. A., The Traveling Salesman Problem: A Case Study in Local Optimiation. In E. H. L. Aarts and J. K. Lenstra, editors, Local Search in Combinatorial Optimiation, pp , John Wiley & Sons, ew York, 997. Johnson, D. S. and McGeoch, L. A., Experimental Analysis of Heuristics for the STSP. To appear in The Traveling Salesman Problem and its Variants, Gutin and Punnen, editors, Kluwer Academic Publishers, Patwardhan, A., The Simultaneous Recurrent eural etwork for Static Optimiation Problems, Master of Science in Engineering Science Thesis, The University of Toledo, 999. Pineda, F. J., Generaliation of Back-Propagation to Recurrent eural etworks, Physical Review Letters, Vol. 59, pp , 987. Riedmiller, M. and Braun, H., A Direct Adaptive Method for Fast Backpropagation Learning: The RPROP Algorithm, Proceedings of the IEEE International Conference on eural etworks, Vol. 5, pp , 993. Serpen, G., Patwardhan, A., and Geib, J., The Simultaneous Recurrent eural etwork Addressing the Scaling Problem in Static Optimiation, International Journal of eural Systems, Vol., o. 5, pp , 200. Serpen, G., and Livingston, D. L., Determination of Weights for Relaxation Recurrent eural etworks, eurocomputing, Vol. 34, pp , Serpen, G., and Xu, Y., Training Simultaneous Recurrent eural etwork with on-recurrent Backpropagation Algorithm, manuscript in review, Werbos, P. J., Generaliation of Backpropagation with Application to a Recurrent Gas Market Model, eural etworks, Vol., o. 4, pp , 988. Werbos, P. J., and Pang, X., Generalied Mae avigation: SR Critic Solve What Feedforward or Hebbian ets Cannot, Proceedings of World Congress on eural etworks, San Diego, CA, pp , September 996. Zurada, J. M., Introduction to Artificial eural Systems, PWS Publishing Company, Boston MA, 995. Page 33 of 38

34 Simultaneous Recurrent eural etwork. By Serpen and Corra Appendix Error Function and its Partial Derivatives In order to train the Simultaneous Recurrent eural network (SR) for the Traveling Salesman problem (TSP), it is necessary to define a measure of the error for the output nodes. The error function needs to ensure valid solutions, as well as a minimum tour length. Certain constraints need to be in place to ensure a valid solution. Given the problem representation presented in Figure 4, each row and column in the array must have exactly one node active: the output value of an active node should be close to.0, while the output value of each inactive node in the array must approach the limiting value of 0.0. The Traveling Salesman problem dictates that each city must be visited once, and only once. Thus, when the network converges to a solution, there should be exactly one node active in each row and each column. This constraint can be implemented using inhibition between the nodes in a given row and column. The error term for the column constraint is defined by 2 E col = g col mj, (A-) i= j= m= where i and j are the indices for rows and columns, respectively, m is the index for rows of the network, ( ) is the stable value of the mj-th output node upon convergence to a fixed point, mj and g col is a positive real weight parameter. When each column of the output matrix has exactly one active node, this error term will be ero. The first summation over the indexing variable i is included because the error function needs to be defined for each node in the output layer. Similarly, the error term for the row constraint is given by 2 E row = g row in, (A-2) i= j= n= Page 34 of 38

ON THE PERFORMANCE OF HOPFIELD NETWORK FOR GRAPH SEARCH PROBLEM

ON THE PERFORMANCE OF HOPFIELD NETWORK FOR GRAPH SEARCH PROBLEM Gursel Serpen and Azadeh Parvin e-mail: {gserpen,aparvin}@eng.utoledo.edu phone: (49) 530 858 fax: (49) 530 846 The University of Toledo