Artificial Neural Network based Curve Prediction

Artificial Neural Network based Curve Prediction LECTURE COURSE: AUSGEWÄHLTE OPTIMIERUNGSVERFAHREN FÜR INGENIEURE SUPERVISOR: PROF. CHRISTIAN HAFNER STUDENTS: ANTHONY HSIAO, MICHAEL BOESCH Abstract We use artificial neural networks to perform curve prediction. For that, we have created a class of neural networks (feed forward multilayer perceptron networks with backpropagation) that have a topology which is determined by their genetic makeup. Using a simple evolutionary strategy on their genes, we optimise the networks topologies to solve the problems at hand. Using this approach, we could generate networks that are able to predict simple functions such as sin(x) or linear combinations thereof, with moderate computational overhead. However, it was not possible to generate networks that predict more complex functions such as sinc(x) or the NASDAQ composite index satisfactorily, within the allowed sizes for the networks. In general though, it appears to be a useful approach to generate neural networks using this form of evolutionary strategy as it substitutes for experience in neural network design. Introduction Curve prediction is one of the most popular applications for artificial neural networks. However, the success of using a neural network to solve a certain problem is inherently linked to the designer s ability to apply an appropriate network to the task. Even relatively simple artificial neural networks such as the multi-layer-perceptron or variants thereof have several degrees of freedom, e.g. the number of neurons, the number of hidden layers, the type of transfer functions employed, which the network is very sensitive to. For most tasks there is no methodology for designing a neural network which guarantees success. Instead, we try to evolve a neural network topology that is suitable for any curve prediction task. Aim To develop an evolvable artificial neural network representation To optimise such a neural network to solve a number of curve prediction To evaluate the ability of an evolutionary approach to evolve suitable neural networks for a given task

Neural Networks In order to perform the prediction tasks described above, we use multi layer perceptron networks and a simple backpropagation learning rule. Then, we use an evolutionary strategy to change the following parameters of the network: Number of hidden layers Number of neurons in each layer Transfer function employed by the neurons in each layer (neurons in the same layer will employ the same transfer function) In order to do this, we define a genetic code for the class of neural networks comprising an N digit binary bit string. In order to limit the optimisation search space, we arbitrarily limit the number of hidden layers to The number of neurons per layer is limited to The allowed transfer functions are linear, linear with bounds and hyperbolic tangens, as shown below: Thus, the number of neurons for each hidden layer can be represented by a four bit number, and the transfer function for the neurons in each layer by a two bit number, totalling a required six bits per layer. As there are up to 10 hidden layers, the total bitstring will be 60 bits long, and is represented as follows: Bit 0 Bit 59 Layer 0 Layer 1 Layer 2 Layer 3 Layer 4 Layer 5 Layer 6 Layer 7 Layer 8 Layer 9 Layer 4, Neuron bit 3 Layer 4, Neuron bit 2 Layer 4, Neuron bit 1 Layer 4, Neuron bit 0 Layer 4, Transfer bit 1 Layer 4, Transfer bit 0 Each layer can have up to 15 neurons, as given by the binary number [Neuron bit 3: Neuron bit 0]. If the bit string encodes zero neurons for that layer, it is interpreted as being non-existent. Also, as the bit

string can encode for four transfer functions in each layer, as given by the binary number [Transfer bit 1: Transfer bit 0], but only three are employed, a bias is given towards the linear y = x transfer function, to be encoded in two of the possible four states. Thus, by changing the genes of a network, it will have a different topology, some more others less suitable for the tasks at hand. Optimisation In order to find an optimal network topology for the given tasks, we use an evolutionary strategy to evolve the genetic makeup of the network, which could also be regarded as a genetic algorithm without cross breeding. The Evolutionary Strategy The algorithm employed works as follows: 1. Produce a first generation of population size seven of random bit strings 2. Generate randomly initialised networks from the population of bit strings 3. Train the networks on a training set using backpropagation 4. Run the networks on the test data 5. A fitness function evaluates the fitness of each network, and the fittest network is kept for the next iteration, while the other ones are discarded. 6. The fittest network is cloned six times to refill the generation, and each of these clones is mutated randomly by inverting one of the 60 bits at random. 7. The process repeats at 3. until a maximum number of iterations has been performed, or a marginal or no improvement in the fitness can be achieved over several iterations A table that contains all the bitstrings that have already been evaluated is kept so as to avoid computing the same network topology multiple times. Here, we evaluate the fitness of the networks in two ways, depending on the task at hand. For short term prediction, a part of the test signal series is used as input to the network, and the first value the network predicts is compared with the actual value of the series at that point. The cumulative error is found by summing the absolute difference between the predicted value and the actual value for all shifted versions of the actual signal as input to the network. The fitness of the network is the reciprocal of the cumulative error. This method evaulates the network s ability to make short term predictions for a given pattern and number of inputs. (see below) Actual signal 1. 1....... N Neural network with inputs and output

i n p u t s o 2. 1....... N i n p u t s o 3. 1....... N i n p u t s O Figure 1: Short term prediction method For long term prediction, a part of the signal is used as input to the network, and the first predicted value is fed back and used as the next input to the network, and this is repeated for a given number of points that are to be predicted. Then, the fitness is the reciprocal of the cumulative absolute difference between the actual signal and the recurrently predicted signal. This method evaluates the network s ability to make long term predictions (forecasts) for a given starting pattern and number of points to predict. (see below) Actual signal 1. 1....... N Neural network with inputs and output i n p u t s o 2. 1....... N i n p u t s o 3. 1....... N i n p u t s O Figure 2: Long term prediction method In both cases, there is a minimum error (and thus maximum fitness) that each network must have, in order to avoid division by zero errors and infinite fitness. Furthermore, smaller networks, i.e. networks with fewer hidden layers are preferred over larger ones, as are networks with a small number of neurons. The Search Space The size of the search space can be calculated as follows:

In fact, the search space is slightly smaller than this, because some of the topologies where one or more layers have zero neurons are equivalent. Still, it can be appreciated, that the search space is large enough to justify this optimisation approach. Evaluation and Discussion We developed a software application with Java that implements this evolvable artificial neural network representation, and which allows our evolutionary strategy to evolve the network topology of the networks. To evaluate our approach, we adopted the following testing strategy. Testing Strategy There are two tasks, short- and long term (function) prediction that our neural networks will have to perform. Here, we shall qualitatively assess the ability of the evolved networks to perform each task, using the following representative test signals: Sinusoidal function: An arbitrary sinusoidal function such as Sin(x) with a given amplitude, frequency and phase. This is probably the simplest test signal for the networks and the tasks can be expected to be managed successfully by the networks. A-Periodic function: An aperiodic function sinc(x) = sin(x)/x. This is a challenging function to predict, as it is not periodic, and not monotonous. It would come as a positive surprise if the evolved networks would manage the task successfully. Noisy a-periodic or pseudo-random function: An excerpt of the NASDAQ composite index historical weekly values shall be used as an interesting and challenging test signal. The ability of the networks to predict this stock market index would be highly surprising and unexpected. Apart from running both tasks on the four different functions, the following questions shall be addressed: Do we always arrive at the same network for the same problem? In order to answer this question, several trials of the same test shall be run. If the same network topologies are arrived at most of the time, it implies that the evolutionary strategy converges to a local or global optimum, which is desirable. How fit are the networks? As mentioned above, there exists a maximum fitness that a network may achieve. How fit, relative to the maximum achievable fitness, are the evolved networks? Tabular Summary The section below describes important aspects of the individual tests in detail. In addition, the table below summarises the results.

Criteria 5 Sin(x) Sinc(x) NASDAQ inputs Short term prediction Long term prediction Same networks Fitness 17% 15% 2% %-Error 36% 86% 48% Convergence Table 1: Summary of the tests for 5 inputs Criteria 10 Sin(x) Sinc(x) NASDAQ inputs Short term prediction Long term prediction Same networks Fitness 95% 15% 2% %-Error 0% 113% 44% Convergence Table 2: Summary of the tests for 10 inputs Criteria 20 Sin(x) Sinc(x) NASDAQ inputs Short term prediction Long term prediction Same networks Fitness 99% 15% 3% %-Error 0% 95% 27% Convergence Table 3: Summary of the tests for 20 inputs Test Details 1. Sinusoidal The neural networks evolved are able to predict the sinusoidal signals with acceptable accuracy, provided they receive enough inputs. Figure 3 below illustrates the evolution process over several trials. Each point on the graph represents an improved network topology over the previous one.

Figure 3: Fitness evolution for different trials for Sin(x) - Clear fitness improvement As with the fitness evolution, the error performance of the evolved networks improves. Figure 4 below illustrates how the %-error in the long term prediction decreases in general with each generation. Figure 4: Error evolution for different trials for Sin(X) - Clear performance improvement Comparing the size of the network (number of neurons inside hidden layers) to the network s performance, it appears, that there exists a certain range of right sizes that the network should have, which allows it to achieve high fitness. Another way of looking at this is that the network should have a

certain minimum complexity (in terms of numbers of neurons) which is adequate to solve the task at hand. Below that critical size, it is unlikely, that a network can achieve a high fitness. Figure 5: Size does matter a network needs a certain minimum size or complexity to achieve high fitness 2. Sinc(x) Unlike in the previous case with sin(x), the evolutionary approach does not generate sufficiently fit networks to perform long-term prediction on a sinc(x) function. Figure 6 and Figure 7 outline the evolutionary performances over several trials. They show clearly, that the evolutionary approach works in principal, i.e.networks are evolving and improving, however the task to predict a sinc(x) seems to be too difficult a task for the simple feed-forward perceptrons employed here. It appears that the network evolution is hitting a fitness and %-error performance limit at about 15% (f)and 90%(e) respectively.

Figure 6: Fitness evolution for different trials for Sinc(x) - No clear fitness improvement Figure 7: Error evolution for different trials for Sinc(x) The error decreases, but is still unacceptably high 3. NASDAQ Attempting to perform long/term prediction on the NASDAQ is ambitious. Here, the evolutionary approach again works somewhat, as it is able to generate networks with improving performance over several generations, however the network model or complexity is again not able to cope with the challenge posed by the NASDAQ. Figure 8 and Figure 9 summarize the network evolution over several trials.

Figure 8: Fitness evolution for different trials for the NASDAQ - Too difficult for the networks

Figure 9: Error evolution for different trials for the NASDAQ - Error performance improves, but it is still too high General Comments The networks performances are closely tied to some randomness in the initialisation and the success of the training. The training method employed, backpropagation, however is not guaranteed to achieve a satisfactory level of training, and does not necessarily find the globally optimal solution parameters for the network. To overcome this, we repeated the training sequence of the networks several times to increase the likelihood of obtaining a well trained network. This however increased the computational load manifold, to an impractical degree, and is therefore not an adequate remedy to decouple the success of the evolutionary approach from its sensitivity on randomness and initial conditions. Conclusion To develop an evolvable artificial neural network representation To optimise such a neural network to solve a number of curve prediction To evaluate the ability of an evolutionary approach to evolve suitable neural networks for a given task The performance of an artificial neural network for a given problem is inherently tied to the topology of the network, and it might prove difficult to produce satisfactory results. Thus, we have set out to create a neural network design framework that replaces the designer s experience with an evolutionary strategy.

We have developed a bit string representation that corresponds to the genetic makeup of a multi-layer perceptron network, allowing an optimisation of the following network parameters: number of hidden layers, number of neurons in the hidden layers and the transfer function employed by the neurons in each layer. Using a simple evolutionary strategy, we tried to optimise the network topology for the tasks at hand by optimising a generation of networks genetic makeup. In principal, this approach has proven valid, and we have demonstrated an evolution of networks to predict a sinus function. For more complex function such as the sinc(x) function or the NASADQ, the evolutionary approach worked, although it was limited by the network model s inherent ability to predict complex patterns. We are optimistic about the approach to evolve neural network topologies for given tasks, and there are several aspects that could be improved or further investigated. In particular, we suggest the following: Within our simulations, we had to limit the search space by limiting the allowed size of the networks and their transfer functions. A more extensive investigation into the evolutionary approach could include other non-linear transfer functions, larger networks as well as more interconnected or feed-back networks. A major limiting computational factor was the learning rule employed (backpropagation). It would be worthwhile to consider other learning rules. In our approach, we allow evolution to uniformly develop networks of all size and shapes. It might be a more fruitful approach to instead constrain the evolution to organic growth, i.e. to start with a small network and growing them by evolution. We have used our evolutionary strategy on function prediction only. It would be worthwhile to apply this method to other common ann tasks, such as image recognition.