Neural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science

Neural Networks Prof. Dr. Rudolf Kruse Computational Intelligence Group Faculty for Computer Science kruse@iws.cs.uni-magdeburg.de Rudolf Kruse Neural Networks

Radial Basis Function Networks Rudolf Kruse Neural Networks

Radial Basis Funktion Networks Properties of Radial Basis Funktion Networks (RBF-Networks) RBF-Networks are strictly layered feed-forward neural networks consisting of exactly hidden layer. Radial Basis Functions are used for input and actiation function. This way a catchment area is assigned to eery neuron. The weights of eery connection between the input layer to a hidden neuron indicate the center of this cathment area. Rudolf Kruse Neural Networks 3

Radial Basis Function Networks A radial basis function network (RBFN) is a neural network with a graph G =(U, C) thatsatisfiesthefollowingconditions (i) U in U out =, (ii) C =(U in U hidden ) C, C (U hidden U out ) The network input function of each hidden neuron is a distance function of the input ector and the weight ector, i.e. u U hidden : f (u) net ( w u, in u )=d( w u, in u ), where d :IR n IR n IR + is a function satisfying x, y, z IRn : (i) d( x, y) = x = y, (ii) d( x, y) =d( y, x) (symmetry), (iii) d( x, z) d( x, y) +d( y, z) (triangle inequality). Rudolf Kruse Neural Networks 4

Distance Functions Illustration of distance functions d k ( x, y) = n i= (x i y i ) k Well-known special cases from this family are: k =: Manhattanorcityblockdistance, k =: Euclideandistance, k : maximum distance, i.e. d ( x, y) =max n i= x i y i. k = k = k k Rudolf Kruse Neural Networks 5

Radial Basis Function Networks The network input function of the output neurons is the weighted sum of their inputs, i.e. u U out : f (u) net ( w u, in u )= w u in u = pred (u) w u out. The actiation function of each hidden neuron is a so-called radial function, i.e.a monotonously decreasing function f :IR + [, ] with f() = and lim f(x) =. x The actiation function of each output neuron is a linear function, namely f (u) act (net u,θ u )=net u θ u. (The linear actiation function is important for the initialization.) Rudolf Kruse Neural Networks 6

Radial Actiation Functions rectangle function: f act (net,σ)= {, if net >σ,, otherwise. triangle function: f act (net,σ)= {, if net >σ, net σ, otherwise. σ net σ net cosine until zero: f act (net,σ)= {, if net > σ, cos( π σ net )+, otherwise. Gaussian function: f act (net,σ)=e net σ e σ σ net e σ σ net Rudolf Kruse Neural Networks 7

Radial Basis Function Networks: Examples Radial basis function networks for the conjunction x x x x y x x (,) is the center is the reference radius Euclidean distance rectengular function as the actiation function bias of in the output neuron Rudolf Kruse Neural Networks 8

Radial Basis Function Networks: Examples Radial basis function networks for the conjunction x x x x 6 5 y x x (,) is the center 6 5 is the reference radius Euclidean distance rectengular function as the actiation function bias of intheoutputneuron Rudolf Kruse Neural Networks 9

Radial Basis Function Networks: Examples Radial basis function networks for the biimplication x x Idea: logical decomposition x x (x x ) (x x ) x x y x x Rudolf Kruse Neural Networks

Radial Basis Function Networks: Function Approximation y y 3 y 4 y y 3 y 4 y y y y x x x x x 3 x 4 x x x 3 x 4 y 4 y 3 y y Approximation of the original function by a step function with each step being resembled by one of the RBF-Network s neurons. (compare MLPs) Rudolf Kruse Neural Networks

Radial Basis Function Networks: Function Approximation.. x σ. y x σ y x y x 3 σ y 3 x 4. σ. Rudolf Kruse Neural Networks y 4. σ = x = (x i+ x i ) RBF-Network calculating the step function shown on the preious slide and the partially linear function on the following slide. The only change is made to the actiation function of the hidden neurons.

Radial Basis Function Networks: Function Approximation y y 3 y 4 y y 3 y 4 y y y y x x x x x 3 x 4 x x x 3 x 4 Representation of a partially linear function by the weighted sum of a triangular function with centers x i. y 4 y 3 y y Rudolf Kruse Neural Networks 3

Radial Basis Function Networks: Function Approximation y y 4 6 8 x 4 6 8 x Approximation of a function by the sum of gaussians with radius σ =. w =,w =3andw 3 =. w w w 3 Rudolf Kruse Neural Networks 4

Radial Basis Function Networks: Function Approximation Radial basis function network for a sum of three Gaussian functions x 5 3 y 6 Rudolf Kruse Neural Networks 5

Training Radial Basis Function Networks Rudolf Kruse Neural Networks 6

Radial Basis Function Networks: Initialization Let L fixed = {l,...,l m } be a fixed learning task, consisting of m training patterns l =( ı (l), o (l) ). Simple radial basis function network: One hidden neuron k, k =,...,m,foreachtrainingpattern: k {,...,m} : w k = ı (l k). If the actiation function is the Gaussian function, the radii σ k are chosen heuristically k {,...,m} : σ k = d max m, where d max = max d ( ı (lj), ı (l ) k). l j,l k L fixed Rudolf Kruse Neural Networks 7

Radial Basis Function Networks: Initialization Initializing the connections from the hidden to the output neurons u : m k= w um out (l) m θ u = o (l) u or abbreiated A w u = o u, where o u =(o (l ) u,...,o (l m) u ) is the ector of desired outputs, θ u =,and A = out (l ) out (l )... out (l ) m out (l ) out (l )... out (l ) m... out (l m) out (l m)... out (l m) m This is a linear equation system, that can be soled by inerting the matrix A: w u = A o u.. Rudolf Kruse Neural Networks 8

RBFN Initialization: Example Simple radial basis function network for the biimplication x x x x y x x w w w 3 w 4 y Rudolf Kruse Neural Networks 9

RBFN Initialization: Example Simple radial basis function network for the biimplication x x where A = e e e 4 e e 4 e e e 4 e e 4 e e A = a D b D b D c D b D a D c D b D b D c D a D b D c D b D b D a D D = 4e 4 +6e 8 4e + e 6.987 a = e 4 + e 8.9637 b = e +e 6 e.34 c = e 4 e 8 + e.77 w u = A o u = D a + c b b a + c.567.89.89.567 Rudolf Kruse Neural Networks

RBFN Initialization: Example Simple radial basis function network for the biimplication x x act act y (,) x x x x x x single basis function all basis functions output Initialization leads already to a perfect solution of the learning task. Subsequent training is not necessary. Rudolf Kruse Neural Networks

Radial Basis Function Networks: Initialization Normal radial basis function networks: Select subset of k training patterns as centers. A = out (l ) out (l )... out (l ) k out (l ) out (l )... out (l ) k.... out (l m) out (l m)... out (l m) k A w u = o u Compute (Moore Penrose) pseudo inerse: A + =(A A) A. The weights can then be computed by w u = A + o u =(A A) A o u Rudolf Kruse Neural Networks

RBFN Initialization: Example Normal radial basis function network for the biimplication x x Select two training patterns: l =( ı (l ), o (l ) )=((, ), ()) l 4 =( ı (l 4), o (l 4) )=((, ), ()) x x w w θ y Rudolf Kruse Neural Networks 3

RBFN Initialization: Example Normal radial basis function network for the biimplication x x A = e 4 e e e e e 4 A + =(A A) A = a b b a c d d e e d d c where a.8, b.68, c.78, d.6688, e.594. Resulting weights: w u = θ w w = A + o u.36.3375.3375. Rudolf Kruse Neural Networks 4

RBFN Initialization: Example Normal radial basis function network for the biimplication x x act act y (,) x x x x.36 basis function (,) basis function (,) output Initialization leads already to a perfect solution of the learning task. This is an accident, because the linear equation system is not oer-determined, due to linearly dependent equations. Rudolf Kruse Neural Networks 5

Radial Basis Function Networks: Initialization Finding appropriate centers for the radial basis functions One approach: k-means clustering Select randomly k training patterns as centers. Assign to each center those training patterns that are closest to it. Compute new centers as the center of graity of the assigned training patterns Repeat preious two steps until conergence, i.e., until the centers do not change anymore. Use resulting centers for the weight ectors of the hidden neurons. Alternatie approach: learning ector quantization Rudolf Kruse Neural Networks 6

Radial Basis Function Networks: Training Training radial basis function networks: Deriation of update rules is analogous to that of multilayer perceptrons. Weights from the hidden to the output neurons. Gradient: wu e (l) u = e(l) u = (o (l) u w u out (l) u ) in (l) u, Weight update rule: w u (l) = η 3 wu e (l) u = η 3 (o (l) u out (l) u ) in (l) u (Two more learning rates are needed for the center coordinates and the radii.) Rudolf Kruse Neural Networks 7

Radial Basis Function Networks: Training Training radial basis function networks: Center coordinates (weights from the input to the hidden neurons). Gradient: w e (l) = e(l) w = s succ() (o (l) s out (l) out (l) s )w su net (l) net (l) w Weight update rule: w (l) = η w e (l) = η s succ() (o (l) s out (l) out (l) s )w s net (l) net (l) w Rudolf Kruse Neural Networks 8

Radial Basis Function Networks: Training Training radial basis function networks: Center coordinates (weights from the input to the hidden neurons). Special case: Euclidean distance net (l) w = n i= (w pi out (l) p i ) ( w in (l) ). Special case: Gaussian actiation function out (l) net (l) = f act(net (l),σ ) net (l) = net (l) e ( net (l) σ ) = net(l) σ e ( net (l) ) σ. Rudolf Kruse Neural Networks 9

Radial Basis Function Networks: Training Training radial basis function networks: Radii of radial basis functions. Gradient: Weight update rule: e (l) σ = s succ() σ (l) = η e (l) = η σ (o (l) s s succ() Special case: Gaussian actiation function out (l) σ = σ e ( net (l) ) σ = out (l) out (l) s )w su. σ (o (l) s ( out (l) out (l) s )w s. σ net (l) σ 3 ) e ( net (l) ) σ. Rudolf Kruse Neural Networks 3

Radial Basis Function Networks: Generalization Generalization of the distance function Idea: Use anisotropic distance function. Example: Mahalanobis distance d( x, y) = ( x y) Σ ( x y). Example: biimplication x x y 3 Σ= ( 9 8 8 9 ) x x Rudolf Kruse Neural Networks 3

Radial Basis Function Networks: Application Adantages simple feed-forward architecture easy adaption quick optimization and calculations Application continuous processes adapting to changes rapidly Approximation Pattern Recognition Control Engineering Below: Examples from [Schwenker et al. ] Rudolf Kruse Neural Networks 3

Example: Handwriting Recognition of digits dataset:. handwritten digits (. samples of each class) normalized in height and width represented by 6 6 greyscale alues G ij {,...,55} data source: EU-project StatLog [Michie et al. 994] Rudolf Kruse Neural Networks 33

Example: Results of RBF-Networks. training samples and. alidation samples (. for each class) training data for learning of classificators, alidation data for ealuation initialization by k-means-clustering with prototypes ( for each class) Method Explanation Test accuracy C4.5 Decision Tree 9.% RBF k-means for RBF-centers 96.94% MLP hiddenlayerwithneurons 97.59% here: median of test accuracy for three training and alidation runs Rudolf Kruse Neural Networks 34

Example: k-means-clustering for RBF-initialization shown: 6 cluster-centers of the digits after k-means-clustering separate run of the algorithm for eery digit, with k =6 centers were initialized by samples chosen from the training data set randomly Rudolf Kruse Neural Networks 35

Example: found RBF-centers after 3 runs shown: 6 RBF-centers of the digits after 3 backpropagation runs of the RBF- Network hardly a difference can be seen in comparison to k-means, but... Rudolf Kruse Neural Networks 36

Example: Comparison of k-means and RBF-Network here: Eucledian distance of the 6 RBF-centers before and after training Centers are sorted by class with the first 6 columns/rows representing the digit, the next 6 rows/digits representing the digit etc. Distances are coded as greyscale alues: The smaller, the darker. left: a lot of small distances between centers of different classes (e.g. 3 and 8 9) these small distance cause misclassifications right: after training there are no small distances anymore. Rudolf Kruse Neural Networks 37