Design and application of neurocomputers

Size: px

Start display at page:

Download "Design and application of neurocomputers"

Diana Lawrence
6 years ago
Views:

1 Loughborough University Institutional Repository Design and application of neurocomputers This item was submitted to Loughborough University's Institutional Repository by the/an author. Additional Information: A doctoral thesis submitted in partial fulfilment of the requirements for the award of Doctor of Philosophy at Loughborough University. Metadata Record: Publisher: c David Naylor Rights: This work is made available according to the conditions of the Creative Commons Attribution-NonCommercial-NoDerivatives 2.5 Generic (CC BY-NC- ND 2.5) licence. Full details of this licence are available at: Please cite the published version.

This item was submitted to Loughborough University as a PhD thesis by the author and is made available in the Institutional Repository (https://dspace

2 This item was submitted to Loughborough University as a PhD thesis by the author and is made available in the Institutional Repository ( under the following Creative Commons Licence conditions. For the full text of this licence, please go to:

3 LOUGHBOROUGH UNIVERSITY OF TECHNOLOGY LIBRARY AUTHOR/FILING TITLE ---- N A"-( \...o(l t 'J). c.. '1'"", _ i I ACCESSION/COPY NO. I. I ()tt-009f , VOL. NO. CLASS MARK I sor 2 8 j'jn 1q(lIi 28 Jl.n,> JUN \ JUN \ JVi I'm I11111.

4 l I

5 DESIGN AND APPLICATION OF NEUROCOMPUTERS by David c.j. Naylor, B.Eng., A Doctoral Thesis. Submitted in partial fulfilment of the requirements for the award of - Doctor of Philosophy of the Loughborough University of Technology. January 1994 : by David Naylor 1994

6 ... Loughborough University of Tpr f!",,""v I.ibrary... Date j - Ii'l' '"\1., Class Acc. No. (l\{ '){> '\ to l..21. "

7 ABSTRACT This thesis aims to understand how to design high performance, flexible and cost effective neural computing systems and apply them to a variety of real-time applications. Systems of this type already exist for the support of a range of ANN models. However, many of these designs have concentrated on optimising the architecture of the neural processor and have generally neglected other important aspects. If these neural systems are to be of practical benefit to researchers and allow complex neural problems to be solved efficiently, all aspects of their design must be addressed. This thesis investigates two particular areas of neural system design and application. The first is a study of the hardware characteristics of a neural processing architecture, to determine the most efficient and effective network structural mapping strategies. Initially, a model of hardware performance and utilisation characteristics is developed, describing the implementation of the Back Propagation learning algorithm in a linear array architecture. Using this model, a number of important structural relationships are discovered between the different layers in a network, which can influence the performance of the neural hardware. These are presented as a set of guidelines that can assist neural application designers in their choice of suitable network structures. This choice will then ensure that not only are the algorithmic performance requirements of an application achieved but also that the capabilities of the neural hardware are fully exploited. The second investigation is a design study for a neural computing system with realtime, high bandwidth image processing application requirements. The effective integration of the neural system within an existing environment is a key consideration for designers, if the capabilities of the neural processing architecture are to be efficiently exploited by real applications. The study identifies and addresses a number of critical design issues including the control of the neural processors, the physical system construction, and the strategy for coupling to the non-neural environment. i

8 ACKNOWLEDGEMENTS I sincerely wish to thank my supervisor, Professor Simon Jones, for his support over the last 3 years. His criticism, guidance and encouragement have been invaluable to me, not only during my research but also in the writing of this thesis. I must also thank him for the opportunities I have had to attend many international conferences. These have been very rewarding experiences. Secondly I wish to thank Mr. David Myers at British Telecom Research Laboratory for his support and welcomed criticism. I must also thank Dr. Mike Whybray, Dr. John Vincent, Mr. John Harbridge, Mr. Colin Williamson, Mr. Tony Briers and Mr. Dave Orrey for their help and advice. It has been greatly appreciated. I must also thank all the members of the Electronic System Design Group at Loughborough and previously, those of the Parallel and Novel Architectures Group at the University of Nottingham. They have been the sources of many useful and stimulating discussions. I particul<\i'ly wish to acknowledge Mr. Mark Gooch and Dr. Andrew Spray in this respect. To my parents, I send my deepest gratitude. I wish to thank them for their support and encouragement over the last 3 years - I will always be indebted to them for the education they have afforded me. Also, I must send a special thank you to Usa for always being there to help me through. For my financial support I wish to thank the Science and Engineering Research Council, without whom this work would not have been possible. Furthermore, I wish to thank British Telecom for their generous sponsorship of this project. Finally, I must send a special thank you to Dr. Karl Sammut. His guidance, support and friendship throughout the last 3 years have been deeply appreciated. ii

9 TABLE OF CONTENTS ABSTRACT... i ACKNOWLEDGEMENTS ii STATEMENT OF ORIGINALITY... iii TABLE OF CONTENTS iv CHAPTER ONE INTRODUcnON Background..., Motivation , Applications of Artificial Neural Networks The Hardware Implementation of Neural Networks Objectives of the Thesis Structure of the Thesis CHAPTER Two REVIEW Objectives of the Review Characteristics of Artificial Neural Networks..., Neural Learning Algorithms Hopfield Bidirectional Associative Memory Back Propagation Restricted Coulomb Energy The Adaptive Resonance Theory The Self-Organising Feature Map Discussion iv

10 2.4 Hardware Architectures for Neural Networks CNAPS BACCHUS Torrent MA ETANN ANNA Discussion Neural Systems The CNAPS System PAN IV CNS-l SYNAPSE-l Mod The ANNA System Analysis and Comparison of Neural Systems Conclusions from the Review CHAPTER THREE OUTLINE OF INVESTIGATION Objectives of the Chapter Objectives of the Research Identification of Research Topics Statement of Research Objectives Experimental Vehicles HANNIBAL Image Processing Application System Environment Investigation Summaries Neural Algorithm Characterisation Network Structural Mapping Hardware Design Study v

11 CHAPTER FOUR HANNmAL Objectives of the Chapter Array Architecture Processor Architecture Back Propagation Implementation Physical Characteristics The HANNmAL Simulator Summary CHAPTER FIVE THE ApPLICATION AND ENVIRONMENT Objectives of the Chapter The Application Motivation Basic Requirements Definition The Solution System Environment Hardware Software Summary CHAPTER SIX NEURAL ALGORfIHM CHARACTERISATION Objectives of the Chapter Objectives of the Investigation Back Propagation Implementation Characteristics Processor Level Characterisation Array Level Characterisation Characteristics Modelling Recall Stage Model Learning Stage Model vi

12 6.5 Conclusions CHAPTER SEVEN NE1WORK STRUCIURAL MApPING Objectives of the Chapter Objectives of the Investigation Methodology Recall Stage Results Analysis Learning Stage Results Analysis Network Structural Mapping Guidelines Conclusions CHAPTER EIGHT HARDWARE DESIGN STuDY Objectives of the Chapter Objectives of the Design Study Design Specification Summary Critical Design Issues Operating Frequency System Communications Interfacing Hardware Controller Software Integration Design hnplementation Overview lmplementing the Feature Location Application System Architecture Host Software Architecture vii

13 8.5.5 Design Status Performance Assessment Recall Learning Conclusions CHAPTER NINE CONCLUSIONS Objectives of the Chapter Review of Objectives Conclusions Measurement of Success Limitations of the Work Further Work Extensions of Current Investigations Further Investigations Summary REFERENCES PUBLICATIONS viii

14 CHAPTER ONE INTRODUCTION 1.1 Background The development of new, more complex and larger applications is placing ever increasing demands on computing hardware technology. Many of these applications are related to the fields of vision and speech processing, and are characterised by high processing and communication bandwidth requirements. Furthermore, their complexity is such that it is often difficult to formulate a complete set of rules to govern their response. Therefore, employing conventional algorithmic modelling techniques can result in ill-defined output characteristics. It is therefore desirable and necessary to develop alternative approaches for the implementation of these applications. Artificial Neural Networks (ANNs) employ architectural structures that are inspired by the brain and algorithms that attempt to mimic its ability for determining complex input-output associations that cannot be easily quantified. ANNs achieve this by processing sample data and storing sufficient, appropriate information within the network's structure to be able to recall the association at any time, when presented with similar input data. Hence, the ANN learns by example the required output response. Furthermore, it has the ability to generalise and by that, can respond in an informed manner to previously unseen input data. 1

15 CHAPTER ONE TNTROPUCrroN Neurons are the primitive processing elements of ANNs. Input stimuli or signals are received from any number of other neurons along connecting synapses. The output of the neuron is determined by the total strength of its inputs, and is itself distributed as an input to many other neurons. The strength of the connection, or the synaptic weight value, between two neurons determines how influential the output of one neuron is, when calculating the output of the other. It is these weight values that represent the information relating to the input-output associations that a network has learnt. Hence, the ANN learning algorithm is used to determine the strengths of all the inter-neuron connections for a particular set of input-output relationships. The work of a few key researchers has been the basis for the development of many, widely applied ANN models. McCulloch and Pitts are recognised as the early pioneers (circa 1943) in many algebraic aspects of neural networks. In 1949, Hebb proposed a learning law for ANNs and postulated that"... repeated activation of one neuron by another, across a particular synapse, increases its conductance... " - a statement that embodies the principles of neuron learning or neurodynamics. The next significant step forward came in 1958 when Rosenblatt introduced his work on a neuron model called the Perceptron. This was followed two years later by the Adaptive Linear or Adaline model from Widrow and Hoff [1]. In 1969, Minsky and Papert -published research that showed there were limits to the learning capabilities of both models and that suitable learning schemes for so-called 'hidden layer' Perceptrons did not exist. Many researchers then began to turn their attention elsewhere. Further details of this and other early work in the field can be found in the literature [2]. A resurgence of interest occurred in the 1980's when solutions to the learning problems of Perceptrons were published [3]. A number of proposals were also put forward for new ANN models, including the Hopfield network [4] and the Adaptive Resonance Theory of Carpenter and Grossberg [5]. These ANN 2

16 CHAITER ONE INIRom TenON models chiefly contrast in terms of their neuron interconnection strategies and the complexity of their learning algorithm. 1.2 Motivation Applications of Artificial Neural Networks The use of ANNs is now becoming widespread in many commercial and industrial fields as their capabilities are beginning to be discovered. Applications that particularly benefit from implementation in an ANN are often characterised by, complex input-output relationships, noisy or corrupted input data, and/or continually adaptable input data. Typical applications which exhibit these characteristics include: Control systems for monitoring and adjusting unpredictable or complex industrial processes in noisy environments [6,7]. Financial forecasting [6] models use neural networks to analyse the complex interaction between several independent variables such as employment figures, trade balance and gross national product. The neural model then offers short or long term financial advice. The complexity of the input-output relationships in such an application is very high. Image processing and pattern recognition. This is a major field in which neural networks have found many practical uses. Applications such as, 3

17 CHAPTER ONE TNrROPUCI!ON translation, rotation and scale invariant pattern recognition [8], character recognition [6,9] image segmentation and compression [10,11], and motion detection and tracking [12] have all been able to exploit the characteristics of ANNs The Hardware Implementation of Neural Networks The combination of high image resolution and a real-time operation requirement can demand a processing bandwidth in excess of loombyte/s. Achieving this throughput rate for image processing applications therefore requires the support of high performance hardware. However, the predominant calculation occurring in a neural algorithm, particularly during the recal! stage, is the matrix-vector operation. This involves the accumulation and nonlinear thresholding of a neuron's input-synaptic weight products to generate the neuron's output. Hence, while the number of neural calculations increase with O(N2+N) for N neurons, the nature of the operations that are being performed remain of the same complexity. State of the art microprocessor technology is one option for the implementation of these algorithms, since this can accommodate the processing bandwidth requirements of typical image applications. However, these devices are very general, complex designs and their high functionality goes beyond the requirements of typical neural algorithms. Hence, an optimised hardware architecture that can exploit the characteristics of the neural algorithm in its design is more likely to offer a cost effective solution. 4

18 CHAITER ONE INTRODIfCTlON Specific hardware architectures have therefore been developed for the implementation of many neural applications, including image processing. The approaches that have been adopted by designers of neural specific hardware are diverse, and depend upon the range of ANN models and applications being implemented. In many applications, the neural hardware will not represent the complete solution. Instead, it will simply accelerate the neural processing stage of an application that was previously implemented in software. Hence, the neural computing hardware is often a component in a larger system and consequently, it is generally considered an enhancing rather than a replacing technology. Some key considerations for the design and application of neural specific hardware include: The architecture and interconnection strategy of the neural processors. Control of the neural processors. The hardware and software integration of neural and non-neural systems. The physical construction/organisation of the neural system. Strategies for mapping neural applications into the hardware. These issues have been investigated to a greater or lesser extent by researchers. However, if ANNs are to offer a practical and efficient approach to the solution of complex, computationally intensive problems then the full capabilities of neural specific architectures must be exploited by the designers of both system hardware and applications. All these aspects must therefore be carefully considered. 5

19 CHAPTER ONE WROP! [(J'fON 1.3 Objectives of the Thesis This thesis aims to understand how to design and apply high performance, flexible and cost effective neural computing systems or neurocomputers for a. variety of real-time applications. Systems of this type already exist for the support of a range of ANN models. However, many of these designs have concentrated on optimising the architecture of the neural processor and have generally neglected other important aspects. This thesis examines two of these areas, namely suitable strategies for the structural mapping of neural networks into a neural architecture to ensure an efficient and effective exploitation of hardware resources, and the engineering design issues that are associated with the control of a neurocomputer, its physical construction and its coupling to other, non-neural systems. The field of neural computation has now evolved from a theoretical onto a practical level. Many ANN algorithms have been developed over recent years, along with specific hardware architectures for their implementation. If this technology is to be usefully and efficiently exploited for the solution of complex and computationally intensive problems, both of the above issues must be addressed. In the first instance, system hardware designers must ensure that the neural subsystem is an integral part of a tightly coupled processing environment. Secondly, application designers must tailor their neural networks to capitalise fully on all the hardware resources available in the neural system. 6

20 CHAPTER ONE TmRomrCfrON 1.4 Structure of the Thesis Chapter One introduces the field and discusses the development of artificial neural networks. It explains the motivation for utilising specialised hardware support in ANN applications and introduces some of the issues that are associated with the implementation of a complete neural system. A review of related work is provided in Chapter Two. First, the characteristics of ANN models are summarised to assist in the study of a cross section of ANN learning algorithms. The review then discusses the characteristics of a range of special neural processors that have been developed for the implementation of these models. Finally, it examines how they have been incorporated into complete neural systems. Comparisons of these systems and their neural architectures help to identify the aspects of their design and application which lack coherent strategies and provide the basis for this research. Chapter Three proposes 3 investigations. These are the neural algorithm characterisation, the network structural mapping and the hardware design study. It also explains the experimental assumptions used in these investigations. The linear array neural processor, HANNIBAL, is used as the research vehicle in the investigations. Its architecture is detailed in Chapter Four. Chapter Five explains the requirements of the image processing application and features of the system environment that are utilised in the hardware design study. Chapter Six is the first investigation and provides a detailed methodological approach to the characterisation of a neural algorithm implemented in the HANNlBAL processor. The model produced is used in Chapter Seven for an investigation into the strategies for structuring multi-layer neural networks 7

21 CHAPTER ONE INTRODUCTION when mapping them into the linear array architecture. The results are presented as a set of mapping guidelines which application designers may use to improve hardware performance. The hardware design study in Chapter Eight aims to identify and address the design of the components of a neural computing subsystem that are critical to achieving its goals. After detailing the specification of the system, a range of issues is identified and discussed, before presenting the selected design strategy. An assessment of performance is also included. Chapter Nine draws together the conclusions from each investigation and discusses whether the objectives have been achieved. It examines the limitations of the work and outlines possible extensions to the existing investigations, as well as further research areas that may be of interest. Finally it summarises the main points of the thesis. 8

22 CHAPTER Two REVIEW 2.1 Objectives of the Review This chapter presents- a review of the field of ANNs as a basis for the work presented in this thesis. The objectives of the chapter are : To define the common characteristics of ANN models and review the leaming and recall procedures of a representative sample. To outline a number of hardware architectures that have been developed for the implementation of these models. Then, to examine how these architectures have been integrated at the system level in the development of neurocomputers. To identify the key issues for research relating to the design and application of these systems. 2.2 Characteristics of Artificial Neural Networks A variety of ANN models exist that differ in their physical and algorithmic structure, and can be classified according to a number of characteristics. The main features that distinguish each model are outlined below. 9

23 CHAPIERTwo REVIEW Network Topology. Neurons are arranged in a layer-wise manner in most ANN models, but the number of layers and their interconnection strategy differentiate between them. Two common structures are shown in Figure 2.1. The multilayer feedforward network shown in (a), consists of several layers of neurons with unidirectional connections between adjacent layers. This example shows a sparse network in which not all the possible connections are made. Any layers that are not directly connected to the outside environment are called hidden layers. The aim of a sparse network interconnection strategy is to create localised decision regions that perform specific neural tasks. By connecting several layers together it is possible to create complex decision surfaces, capable of characterising any arbitrary input-output relationship. The single layer recurrent network shown in (b), introduces a time related response to an input stimuli and hence, is a form of dynamic network (as compared to the static network in (a». The outputs from each neuron are fed back via weighted connections in an iterative scheme that will reach a steady state over time. This class of network is important for the modelling of nonlinear dynamic control systems or can simply provide a form of short term memory. Leaming algorithm. Learning is the process of determining the strengths or weight values of the synaptic connections between the neurons in a network. This may be performed in a single-shot or recursive manner. The single-shot method involves the straight forward calculation of the weights given the complete set of input vectors. Therefore the weight values store a direct representation of the input vectors. When an input is presented the resulting output is simply the best matching stored pattern. This is known as an autoassociative learning algorithm. Recursive learning algorithms are either supervised or unsupervised. Both of these adapt their weights in an iterative process until a steady or 'acceptable' 10

24 CHAl'TER Two REVIEW state is reached. In supervised learning the weights are modified to reflect the difference between the expected and actual output vector for a particular input. When this error decreases below a preset threshold the network is considered to have converged and the learning process is halted. Unsupervised learning does not require example output patterns to guide the network to a solution. These learning algorithms examine each input vector in turn and formulate their own output representations that differentiate between dissimilar inputs. Both supervised and unsupervised networks can be used to map a particular input pattern set into a different output set - a heteroassociative learning algorithm. A classifier is a simple form of a heteroassociative network. There are a number of ways to 'determine if the solution to which the network has converged is acceptable. The most common method is to examine the size of weight modifications for a pass of the learning data set. If all the weight updates during this operation are below a certain threshold value then the network has converged. This method does not guarantee that the solution is correct however, as a local minimum in the decision surface could have been found. In this situation, the weights will be stable but the output will not always be correct. To verify that a global solution has been obtained the output must be analysed. Data that the network has not seen before - a test set - is often applied for this purpose. How accurately the network's output matches the expected result can be used as a measure of the learning success. Recall Strategy. Regardless of the network topology or the learning algorithm, the output of a neuron can generally be defined in a simple form as, oil 8YfIlIlJSU Neuron Output = Non-linear function ( L Input x Weight) (2.1) In multilayered feedforward networks, a single pass of this equation is required for each layer to generate the output. However in a recurrent 11

25 CHAmR llio REVIEW Output Vector Y Output Vector Y )) h I) / / \). I' - Input Vector X Input vector X (a) MultiIayer Feedforward Network (h) Recurrent Network Figure 2.1. Typical neural network topologies. network, the recall procedure requires multiple iterations around the feedback path until the state of the output vector remains constant. In this case, the input data are the outputs from the previous iteration. Activation Function. The activation or threshold function provides the neuron with its non-linear mapping characteristics. A variety of functions are used, depending upon the learning algorithm and data type. Some typical examples are shown in Figure 2.2. The sigmoid and ramp function allows a neuron to signal its degree of confidence that a particular feature is present in its input data. The '-1' output of the signum function allows an output to be inhibitory, as opposed to an output of '0' which is simply non-excitatory. 12

26 CHAPTER Two REVIEW out out 1 out 1f-- ---I----in in ramp sigmoid signum Figure 2.2. Typical activation functions. Data Type. The input data for a model can be discrete (binary) or continuous. Discrete data models will generally employ parallel updating of neuron output values, whereas neurons which operate on continuous data may also function asynchronously. 2.3 Neural Learning Algorithms This section provides an overview of a cross section of learning algorithms for ANNs. Introductions to several of these are provided by Lippmann [13] and Wasserman [14]. A glossary of the notation used in the following subsections is presented below. 1 index of the layer (1=0 input, 1=1 1st hidden, etc.) L NI j,k w Jk X k Yj e j number of layers (excluding input layer) number of neurons in layer I indices of neurons synaptic weight value between neurons j and k neuron input element k output value of neuron j expected output value of (output layer) neuron j 13

27 CHAPTER Two REVIEW bias value for neuron j T j radius value for neuron j 1/ constant global learning gain factor, 0<1/<1 1/(t) learning gain factor on iteration t, 0<1/(t)<1 f(e) activation function V input vectors in the learning data set v input vector v t iteration number Hopfield The Hopfield model is composed of a single layer of neurons with fully interconnected synapses, as shown in Figure 2.1(b). The model is an autoassociative type that is capable of realising associative memories, pattern classifiers and optimisation circuits. Although many people had previously examined this recurrent structure, it was the work of John Hopfield that provided a resurgence of interest [4]. His original proposal was for a binary input neuron with a step activation function. Each neuron in the layer could respond asynchronously to changes in its input data. Restrictions were placed on the synapse weight values, requiring the weight matrix to be symmetrical and contain only zeros on its diagonal. Hence, there was no feedback connection from a neuron to itself. These restrictions were placed on the model to ensure stability. Learning The more commonly used version of this learning algorithm employs a sigmoid activation function with parallel updating of the outputs and non-zero diagonal weight matrix terms. The process requires a one-shot calculation of the weight matrix values using (2.2) and is known as Hebbian learning. In this 14

28 CHAPTER Two REVIEW case, binary values are initially converted into a bipolar format before calculating the sum of outer-product matrices for each weight. y = L ( 2x/ - 1 )( 2x; - 1 ) (2.2).=1. Recall When presented with an input vector the Hopfield model must perform several iterations of the feedback loop to generate the associated output. On each iteration every neuron calculates the sum of its weighted inputs and thresholds the result. The output is fed back to the input for the next iteration until the network stabilises - indicated by no further changes in the output. Problems, Solutions and Alternative Methods Stability is always an issue with recurrent networks. The use of non-zero diagonal terms in the weight matrix can help to reduce the number of oscillating states and make the model less sensitive to noisy inputs. However, this can also lead to the generation of an increased number of spurious stable states that were not contained in the learning data. Annealing is a technique which can alleviate the problem of erroneous states or local minima [15]. The limited information storage capacity of Hopfield networks is another problem. For a network with C binary connections there are 2 c possible states, but the storage capacity is limited to approximately O.ISC [4]. Removal of the zero diagonal terms increases this capacity but at the expense of lower error correction capabilities and an increased susceptibility to false matches. Higher order nonlinear functions can also improve the capacity of the model [16]. 15

29 CHAPTER Two REVIEW Bidirectional Associative Memory The Bidirectional Associative Memory (BAM) is a recurrent network model that is similar in its capabilities to Hopfield. However, it is hetero-associative and can therefore produce an output vector that is related to the input, but is not necessarily the same. This characteristic is a result of the dual layer structure of the recurrent model, illustrated in Figure 2.3. Kosko's work has been particularly valuable in the development of this model [17]. Learning The basic learning paradigm for the BAM is a single-shot process, similar to the Hopfield model. Associations between input and output vector pairs are encoded in the weights using (2.3), where T denotes the transpose. y W = L(Xf y; (2.3) v=1 Recall The weight matrix can be regarded as a long term memory, while the outputs of both layers A and B are short term. The aim of the recall process is to make the short term memory output of B converge to the stored vector in the long term memory that is associated with the present input vector. The step thresholded outputs from layer A are fed into B which then calculates its own outputs. These are fed back into A whereupon the cycle repeats until neither output vector changes. The output vectors for each layer after an iteration are given by (2.4) and (2.5). 16

30 CHAPTER Two RE\fJEW LayerB Layer A Figure 2.3. Bidirectional Associative Memory. K y}(t+1) = f ( l::.%l(t) Wft ).1:=1 (2.4).%1(t+1) = f ( 1:: Yj(t) WfJ ) J }=1 (2.5) Problems, Solutions and Alternative Methods As with the Hopfield model, the BAM has a limited information storage capacity. Kosko estimated that the maximum number of associations that could be stored did not exceed the number of neurons, N, in the smallest layer [17]. More realistic calculations put the figure at N/(4 log2 N). Alternative activations functions can be used to increase this capacity. In the 17

31 CHAPTER Two REVIEW non-homogeneous BAM, each neuron can set its own threshold point which theoretically provides a maximum of 2N stable states. In practice however, the number is much less. Advanced forms of the BAM exist that allow asynchronous neuron state changes, adapt their weights during actual operation and have intra-layer inhibitory connections Back Propagation Back Propagation (BP) is a learning algorithm for Multi-Layered Perceptron (MLP) networks. It was proposed by Rumelhart et al. [3] as a solution to the training of hidden Perceptrons and is one of the most popular algorithms in use. A typical MLP network structure is shown in Figure 2.4. This network is capable of creating multiple open or closed convex decision regions. These regions can become arbitrarily complex and concave in shape, with a second hidden layer of Perceptrons. Learning The learning process is a gradient search technique designed to minimise the mean square difference between the expected and actual output. Hence, it is a supervised algorithm and is most commonly used in classification tasks. It uses the sigmoid non-linearity as shown in Figure 2.2, since it requires a differentiable, hence continuous, activation function. The procedure begins with the initialisation of the weights to small ( ± 0.1 ) random values. Three stages then follow - activation feed forward, error back propagation and weight update. 18

32 CHAPTER Two REVIEW The activation feedforward stage requires each neuron in the network to calculate their activation values using the outputs from the previous layer, as shown in (2.6). N 1 -'-1 (2.6) Y/ = f ( a; + L wji xi ) k=o Once the output layer activations have been calculated the difference between the actual and expected output can be obtained using (2.7). The successful convergence of the algorithm can be measured by the size of this error. (2.7) Using a first approximation to the derivative of the sigmoid activation function, (2.7) can be rewritten as shown in (2.8). (2.8) The output layer errors must be propagated back to its adjacent hidden layer, which calculates its own output errors using (2.9); where i donates the index of neurons in the layer above. All hidden layers repeat this process. (2.9) The final stage requires the weight values associated with each neuron to be modified in proportion to the size of the error. The bias value associated with each neuron is updated in the same way - see (2.10) and (2.11). The 3 stages of the learning process are repeated until the network converges. 19

33 CHAPTER Two REVIEW Output Layer Hidden Layer Input Layer x". X, x.. X 3 x,.. Xs Figure 2.4. Multilayer Perceptron Network. (2.10) (2.11) Recall The recall procedure is exactly the same as the activation feedforward stage of learning, as formulated in (2.6). 20

34 CHAPTER Dyo REVIEW Problems, Solutions and Alternative Methods The learning process can be shortened by accumulating individual neuron. error values over the complete learning data set and only then updating the weights. This is known as epoch learning and has the advantage that the learning process is not influenced by the order of exemplars in the data set. The addition of another term to the weight update equation can sometimes increase the rate of decent towards a global minimum. This momentum term can also increase the stability of the convergence procedure. The BP algorithm has successfully demonstrated its ability to learn complex representations, notably in the NetTalk application [18]. However, networks are characteristically very large and the number of iterations of the data set during learning are noticeably greater than for other algorithms. Therefore, several modified BP algorithms have been proposed that employ dynamic neuron creation during learning [19,20] or post-learning pruning techniques to remove duplicated or redundant neurons [21,22]. Both methods aim to match the information capacity of the network to the information content of the data set. Networks which are over- or under-sized can both exhibit diminished generalisation characteristics when presented with a test data set. Techitiques that adapt the value of 1/ during learning have also been shown to improve convergence times [23]. Similarly, the selection of initial weight values and the activation function parameters can also influence the learning speed of networks using the BP algorithm [24]. 21

35 CHAI'TEB Two REVIEW Restricted Coulomb Energy The Restricted Coulomb Energy (RCE) model departs from the use of standard non-linear activation functions and instead employs neurons with Radial Basis Functions (RBFs). Typical examples of these are shown in Figure 2.5. Unlike the sigmoid or ramp functions used in other models, RBFs form functions over a finite region of input space. Thus arbitrarily complex decision regions can be formed with simpler networks resulting in faster training, particularly for classification problems. The structure of the RCE model is shown in Figure 2.6. Note that each hidden RBF neuron is connected via a unity weighted synapse to only one output neuron and these employ standard threshold functions such as the sigmoid. Learning The RBF neurons do not compute their output from the sum of their weighted inputs, but instead regard the weights as defining a point in the input space. The output is calculated as a function of the distance from this point to the point defined by the input vector. Mathematically the RBF neuron output is given by (2.12). Learning is a two stage process which must establish not only the position of the neuron's activation in the weight space but also its radius. YJ = f( (2.12) An unsupervised process, such as the K-Means clustering algorithm,.is often used to calculate the weights and hence determine the centre of each RBF. Alternatively, the RBF layer can be dynamic and grow in response to the information storage requirements during learning. Assuming a block step RBF threshold function, the procedure is as follows. 22

36 CHAPTER Two REVIEW Gausslan Block Step Figure 2.5. RBF neuron activation functions. Apply an input vector and calculate the output using (2.12). Sum and threshold the output layer neurons. For each output : If '1' and should be '0', shrink the radius of ALL neurons in the RBF layer that are outputting '1' until T j = xi' If '0' and should be '1', spawn a new RBF neuron centred on this input point. Otherwise do nothing. Repeat the process for all input vectors. Recall The recall process requires a single forward pass through the network in which the outputs of the RBF layer neurons are calculated using (2.12) and the output layer thresholds the sum of it unity weighted inputs. 23

37 CHAPTER Two REVIEW Output Layer Hidden RBF Layer lnputlayer Figure 2.6. RCE model network. Problems, Solutions and Alternative Methods There are number of extensions available for the RCE model, particularly in the structure of its learning algorithm. These are too detailed to discuss here but are summarised in [25]. Problems that exist with the basic RCE learning algorithm discussed above include the inability to increase the radii of RBF neurons or move the position of the function once the weight has been set. Therefore, new neurons are often 24

38 CHAPTER Two REVIEW introduced unnecessarily. Furthermore, it is not possible to determine the importance - how often the output is active - of particular neurons in relation to others. Hence, the information content of a particular neuron cannot be ascertained. Ideally, the more information that becomes encoded in a neuron the less responsive that neurons should be to noisy input data The Adaptive Resonance Theory The Adaptive Resonance Theory (ART) is an ANN model developed by Carpenter and Grossberg for unsupervised classification tasks [5]. The problem with most models is their inability to adapt to new input data after the initial learning process. has been completed. The only way to introduce further classifications or associations is to retrain the ANN with the expanded data set. This problem was termed the 'Stability-Plasticity Dilemma' by Grossberg, who developed the ART model as its solution. The structure of the model is illustrated in Figure 2.7 and can be seen to differ considerably from those reviewed so far, not least due to the existence of two distinct weight matrices, T and W. The Comparison and Recognition layers both maintain the neuron-like functionality of other models but have a number of discerning features. The Gain units and Reset provide the special control features that make the model unsupervised and will be discussed in the context of the operating procedure. Since the model does not have distinct learning and recall phases it is not appropriate to describe two separate processes. Therefore, the general operation of the model will be explained, which will incorporate both the storage and retrieval phases. The version of the model described here uses binary input vectors. 25

39 CHAPTER Two REVIEW Outputy 1-1 GAIN 2 I G2 RECOGNTIlON LAYER Feedback Feedforwaro I RFSEf I Weight Welght MatrlxT MatrlxW 1-1 GAIN 1 : G1 COMPARlSON LAYER Input X Figure 2.7. The ART network. There are four phases of operation of the model : Initialisation, Recognition, Comparison and Search. The initialisation phase, at t=o, involves the calculation of the weight matrices and the resetting of the output and gain signals. The feedback weights are all initialised to 1 while the values of the feedforward weights are determined by the number of neurons in the Comparison layer, K, using (2.13). The outputs of the Gain units, G 1 and Gz are initialised to zero. (2.13) During the Recognition phase an input vector is fed into the Comparison layer. Each neuron in this layer receives three inputs: G 1, Xk and the feedback 26

40 CHAPTER Two REVIEW from the Recognition layer, tk. The output is then calculated using the 'twothirds' rule - if any two of the three inputs are active then a '1' is output, otherwise the output is '0'. When a valid input is presented to the network, G 1 is set to '1'. tk is initially zero since no Recognition layer neurons are active. Therefore at this stage, in accordance with the two-thirds rule, the input vector is passed directly through to the Recognition layer where each neuron compares it against its own 'pattern template', stored in the weight matrix, W. This layer determines which stored pattern is the best match for the new input and activates the output of the appropriate neuron. For this, each neuron must calculate the dot product of the weight and the input vector. The neuron with the strongest output signal then 'turns off' all the others. This is achieved using connections between all neighbouring neurons and a process known as lateral inhibition. This winner-takes-all competition between output neurons ensures that only one can be active in response to a particular input. Achieving an active state simply means that this stored pattern or class exemplar is the closest to the input vector; there must still be a process to decide if it is close enough. The winning neuron, j', passes its class exemplar, stored in the weight matrix T, back to the Comparison layer. The Comparison phase can now take place. G 1 is now set to zero so that the neurons can calculate a comparison vector C, in accordance with the two-thirds rule. This is fed into the Reset unit where the quality of the match is determined using (2.14). The result,s, must then be compared against a preset vigilance threshold, p, which makes the final decision regarding the similarity between this input vector and the active output class, as shown in (2.15). 27

41 CHAPTER nvo REVIEW s = number 0/ matching ones in the input and feedback vectors number 0/ ones in the input vector (2.14) If S > P then X belongs to class j. If S p then X has been misclassified. (2.15) If the pattern falls into the selected class, the weights must be adapted to reflect the new input pattern using (2.16) and (2.17). The process may then restart for the next input pattern. (2.16) K L tj;l,(t)x k k=1 (2.17) If the pattern has been rnisclassified, the model has to check all the other possible output classes and create a new one if necessary. The Search phase performs this task by initially disabling the output of the 'bad' neuron using G 2 and allowing the next strongest output, that was previously inhibited, to become active. The same process is repeated until an output class is found that meets the vigilance threshold. If none of the classes are close enough to the input vector, a new class exemplar must be created and the weights determined. as above. 28

42 CHAPTER Two REVIEW Problems, Solutions and Alternative Methods It is clear that the vigilance parameter will severely influence the number of classes that are created by the model. Too Iowa value of p will cause very dissimilar vectors to be assigned to the same class and too high a value will make the model too sensitive to minor differences. Furthermore, it is difficult to determine the optimum value for p due to its sensitivity to the context of the training data. Two forms of learning are available - fast and slow. The fast process has been described above and is the most commonly used. Slow learning requires several presentations of each input vector and each time small modifications are made to the weights until convergence is achieved. This method results in weights that are less influenced by anyone input and hence better classification characteristics are obtained. More advanced forms of ART are available. ART-2 develops the model for continuous valued input data [26] and improves the capabilities of the model significantly. ART-3 [27] moves the model closer to a biological system representation by allowing it to respond to real-time constantly varying inputs The Self-Organising Feature Map The self-organising feature map, developed by Kohonen [28], relies on biological evidence which suggests that sensory pathways in the brain are arranged to reflect the characteristics of the input stimuli. Hence, the model makes the assumptions that input patterns with common features will define the class structure and that these features can be extracted from the input data. The structure of the network is illustrated in Figure 2.8. It consists of a single layer, 2D grid arrangement of neurons that are connected to their nearest 29

43 CHAPTER 1Wo REVIEW neighbours and fully connected to an input layer of K neurons. Each neuron on the grid produces an output which represents a particular class of input vector. Learning The unsupervised learning procedure selects a region of the 2D output space to represent a particular class of K-dimensional input vectors. The connections between adjacent neurons in the grid create a winner-takes-all network in which only the strongest output is active. Usually the weights and input data are normalised. This ensures that the training process uses the spatial orientation of vectors rather than their magnitudes to determine the region in the output space or class to which an input belongs. The learning process is therefore faster than for some other neural algorithms, as a degree of variability is removed from the weight space. When the weights are initialised, they are randomly spread around the normalised weight space. This can cause learning difficulties if the input vectors are not evenly distributed, as neurons whose weight orientation is very different to that of the learning data may never become active, allowing regions of the output space to remain unused. To prevent this, the learning paradigm employs a 'shrinking neighbourhood' technique to localise the effects of weight modifications. This forces inputs that are too dissimilar to search for their own independent regions of the output space. The initial size of the neighbourhood around each neuron, M j (0) is dependent upon the dimensions of the grid and of the input vector. The winner takes all selection process involves the calculation of the Euclidean distance d j between the input and weight vectors for each neuron. It is calculated using (2.18). The neuron with the minimum distance is chosen to be updated and is designated t. 30

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer