Radial Basis Function Networks

Similar documents
CS2NN16 Neural Networks Part B

COMPUTATIONAL INTELLIGENCE

Radial Basis Function (RBF) Neural Networks Based on the Triple Modular Redundancy Technology (TMR)

Neural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science

Radial Basis Function Networks: Algorithms

Support Vector Machines

Width optimization of the Gaussian kernels in Radial Basis Function Networks

Chap.12 Kernel methods [Book, Chap.7]

CSE 5526: Introduction to Neural Networks Radial Basis Function (RBF) Networks

CHAPTER 6 IMPLEMENTATION OF RADIAL BASIS FUNCTION NEURAL NETWORK FOR STEGANALYSIS

Linear Regression: One-Dimensional Case

Surfaces, meshes, and topology

Linear Models. Lecture Outline: Numeric Prediction: Linear Regression. Linear Classification. The Perceptron. Support Vector Machines

Lab 2: Support vector machines

CHAPTER IX Radial Basis Function Networks

6. Linear Discriminant Functions

Learning from Data Linear Parameter Models

Automatic basis selection for RBF networks using Stein s unbiased risk estimator

IMPLEMENTATION OF RBF TYPE NETWORKS BY SIGMOIDAL FEEDFORWARD NEURAL NETWORKS

Data Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017

Lab 2: Support Vector Machines

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

Support Vector Machines

Learning from Data: Adaptive Basis Functions

Machine Learning Classifiers and Boosting

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Channel Performance Improvement through FF and RBF Neural Network based Equalization

3 Nonlinear Regression

LECTURE 5: DUAL PROBLEMS AND KERNELS. * Most of the slides in this lecture are from

Basis Functions. Volker Tresp Summer 2017

Support Vector Machines

In this assignment, we investigated the use of neural networks for supervised classification

3 Nonlinear Regression

(Creating Arrays & Matrices) Applied Linear Algebra in Geoscience Using MATLAB

Classification and Regression using Linear Networks, Multilayer Perceptrons and Radial Basis Functions

Recent Developments in Model-based Derivative-free Optimization

Support vector machines

Application of Support Vector Machine In Bioinformatics

Accurate modeling of SiGe HBT using artificial neural networks: Performance Comparison of the MLP and RBF Networks

Hybrid Training Algorithm for RBF Network

Comparing different interpolation methods on two-dimensional test functions

On the Kernel Widths in Radial-Basis Function Networks

Artificial Neural Networks MLP, RBF & GMDH

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Function approximation using RBF network. 10 basis functions and 25 data points.

Set 5, Total points: 100 Issued: week of

Chapter 4. The Classification of Species and Colors of Finished Wooden Parts Using RBFNs

Improving Performance of Multi-objective Genetic for Function Approximation through island specialisation

Image Warping. Srikumar Ramalingam School of Computing University of Utah. [Slides borrowed from Ross Whitaker] 1

Multivariate Analysis (slides 9)

Image Compression: An Artificial Neural Network Approach

K-means clustering Based in part on slides from textbook, slides of Susan Holmes. December 2, Statistics 202: Data Mining.

Artificial neural networks and support vector machines

Points Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked

Neural Networks: What can a network represent. Deep Learning, Fall 2018

Neural Networks: What can a network represent. Deep Learning, Spring 2018

Almost Curvature Continuous Fitting of B-Spline Surfaces

Chapter 3. Speech segmentation. 3.1 Preprocessing

Neural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R.

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

Introduction to ANSYS DesignXplorer

Assignment 2. Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions

Dr. Qadri Hamarsheh Supervised Learning in Neural Networks (Part 1) learning algorithm Δwkj wkj Theoretically practically

Introduction to Machine Learning

ESSENTIALLY, system modeling is the task of building

lecture 10: B-Splines

LS-SVM Functional Network for Time Series Prediction

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS)

Recent advances in Metamodel of Optimal Prognosis. Lectures. Thomas Most & Johannes Will

Lecture 10 CNNs on Graphs

Lecture 3: Linear Classification

Chemnitz Scientific Computing Preprints

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

Support Vector Machines.

Reification of Boolean Logic

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

Instance-based Learning

Week 3: Perceptron and Multi-layer Perceptron

An Introduction to PDF Estimation and Clustering

An Algorithm For Training Multilayer Perceptron (MLP) For Image Reconstruction Using Neural Network Without Overfitting.

Alex Waibel

CS 559: Machine Learning Fundamentals and Applications 9 th Set of Notes

Kernel Methods & Support Vector Machines

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

Introduction to MAPPER

A Dendrogram. Bioinformatics (Lec 17)

Support Vector Machines (a brief introduction) Adrian Bevan.

CS6220: DATA MINING TECHNIQUES

Figure (5) Kohonen Self-Organized Map

A Comparative Study of LOWESS and RBF Approximations for Visualization

Locally Weighted Learning for Control. Alexander Skoglund Machine Learning Course AASS, June 2005

HOUGH TRANSFORM CS 6350 C V

RADIAL BASIS FUNCTIONS NETWORK FOR DEFECT SIZING. S. Nair, S. Udpa and L. Udpa Center for Non-Destructive Evaluation Scholl Road Ames, IA 50011

Evolving Radial Basis Function Networks via GP for Estimating Fitness Values using Surrogate Models

Performance Measures

Machine Learning for NLP

Last time... Bias-Variance decomposition. This week

Perceptron as a graph

Transcription:

Radial Basis Function Networks As we have seen, one of the most common types of neural network is the multi-layer perceptron It does, however, have various disadvantages, including the slow speed in learning In this lecture we will consider an alternative type The Radial Basis Function (or RBF) network See Broomhead DS and Lowe D, 1988, Multivariable functional interpolation and adaptive networks, Complex Systems, 2, 321-355. First we will make a note of fitting models p1 RJM 12/9/05

Fitting Data Points y y x x In the first figure, a line is set as the best approx for the data p2 RJM 12/9/05 (can be done by minimising the square of errors) A better solution is a curve: could be done by a series of lines Each line at an operating point this leads to RBF nets If 3D have planes, in more dimensions have hyperplanes

Centres Basic idea is that there are centres At each centre there is a simple model This model is a basis function which processes the input data according to its distance from the associated centre Each centre is like a neuron The basis function is similar to a MLP s activation function The ouptut(s) of the network are the weighted sum of the output of each neuron Overall, a RBF network has an input layer, comprising the inputs to the network a hidden layer, comprising the basis function neurons, an output layer, doing weighted sum of hidden layers p3 RJM 12/9/05

Inputs Hidden Units Output Units Outputs RBF Network RB function processing inputs Weighted sum of inputs + bias p4 RJM 12/9/05

Basis Functions Takes in a scalar number and returns a scalar number which is the output of the hidden layer node. Examples of common basis functions: Thin Plate Spline: φ (x) = x 2 log x Gaussian: φ (x) = exp (-x 2 /2w) where w represents its width. What happens in a hidden layer node? φ(x) Euclidean distance between input and centre vector found. n dimensional Pythagoras ( (x 1 -c 1 ) 2 + (x 2 -c 2 ) 2 +.. (x n -c n ) 2 ) Distance produced is passed through the basis function to produce the output of the hidden layer node. x p5 RJM 12/9/05

Specific Equations Let there be m neurons in hidden layer and k outputs Let the input be X = [x 1, x 2 x n ] Let the rth centre be C r = [c r1, c r2 c rn ] Let the basis function be φ(x) Let the weight of connection from r th neuron to j th output be w rj m y = w + w *φ X C Then the jth output is: ( ) j 0j i= 1 ij i where X - Ci = n ( x c ) r= 1 r ir 2 p6 RJM 12/9/05

Example RBF to evaluate XOR The Centres are chosen as: C 1 = [1 1] T and C 2 = [0 0] T The weights of the output node are w 0 = 2.84, w 1 = w 2 = -2.5 We will use Gaussian basis function exp(-dist 2 ) The input, let X = [0 1]; For XOR the desired output is [+1] Distance Squared of X from C 1 [1 1] (0 1) 2 + (1-1) 2 = +1 Output of neuron 1 = Gaussian (+1) = e -1 = 0.3679 Distance Squared of X from C 2 [0 0] (0 0) 2 + (1-0) 2 = +1 Output of neuron 2 = Gaussian (+1) = e -1 = 0.3679 Output = (0.3679*-2.5)+ (0.3679*-2.5)+ (2.84) = 1.0006 p7 RJM 12/9/05

In MATLAB function [output, houts] = rbf_calc (rbfnet, input) % [OUTPUT, HOUTS] = RBF_CALC (RBFNET, INPUT) % RNFNET is a struct with three fields % CENTRES each row of matrix are centres for one neuron % RADIUS radius^2 (of Gaussian basis function) of each neuron % WEIGHTS the weights (including bias) for OUTPUT % Dr Richard Mitchell 31/07/03 houts = [1]; % outputs of hidden nodes: first is 'input' to bias = 1 for ct=1:size(rbfnet.centres,2) % for each hidden node distsq = sum((input'-rbfnet.centres(:,ct)).^2); % dist squared houts = [houts, exp(-distsq/(2*rbfnet.radius(ct)))]; end % add to houts, output of basis func output = dot (houts, rbfnet.weights); % compute actual output p8 RJM 12/9/05

Continued and Then >>rbfn = struct('centres', [1, 1; 0 0]', 'radius', [0.5 0.5], 'weights', [2.84-2.50-2.50]) >> rbf_calc(rbfn, [0 1]) ans = 1.0006 >> xor=[0 0; 0 1; 1 0; 1 1]; % xor test set >> for ct = 1:4, xo(ct) = rbf_calc(rbfn, xor(ct,:)); end; xo xo = 0.0017 1.0006 1.0006 0.0017 So can solve XOR (and hence other hard problems) But how are centres chosen, radius found and weights set? p9 RJM 12/9/05

Designing an RBF How to choose centres Centres chosen to represent input training data set. Correct choice of centres is critical for good performance. The number of centres Too many or too few leads to inaccuracy. Now, no rigorous formal method to find optimum number. Methods for choosing centres: Simple distribution Uniform distribution over data space Gaussian distribution over data space Distribution related to the data distribution Eg. Use of clustering algorithms say Mean Tracking p10 RJM 12/9/05

p11 RJM 12/9/05 Finding Radius of Each Centre Once each centre is found, need to find the radius of the Gaussian basis function used on that centre Radii should be set so the Gaussian from one centre overlaps with near centres to a certain extent This ensures smooth transition across the data space. Picton: for each centre find the P (typically 2) closest centres If C j is centre of interest and C j1..c jp are closest centres to C j Then the radius is set by: P σ ( C Cjp ) j = 1 j P i= 1 x 2 Remember Gaussian is φj(x) = exp - 2σ j 2

Training the RBF Once the centres and their radii are chosen, all that is left is to choose the weights. This is (compared with mlps) very easy As for mlps, have training set with inputs & expected outputs For pth item in training set calc output of each hidden node h ip = φ (X p, C i, R i ) X p is pth input set from training set C i and R i are centre of radius The network output (whose value we know) is y p ( X,C,R ) = w w * h = w + w * φ + 0 i i p i i 0 i i ip p12 RJM 12/9/05

Thus, For The Training Set We Can Say y 1 = w 0 + w 1 *h 11 + w 2 *h 21 +... w n *h n1 y 2 = w 0 + w 1 *h 12 + w 2 *h 22 +... w n *h n2 : : y m = w 0 + w 1 *h 1m + w 2 *h 2m +... w n *h nm This can be written in matrix form y1 1 y2 1 Y = = : : ym 1 h11 h12 : h1m h21 h22 : h2m...... hn1 w0 h n2 w 1 : h nm wn More compactly : Y = HW-we want W & know Y and H p13 RJM 12/9/05

Solving to find W We know expected Y, and can find H for whole training set Then W = H -1 * Y If number of items in training set equal number of weights, there is an exact solution, if equations are independent In general there are more items in the training set. However it is a linear matrix task, which is easy to solve. If H has all the H values (including dummy 1s), and Y the expected Y values, in MATLAB, W is found simply by >> W = H \ Y % means matrix div of H into Y This finds the best solution, in the least squares sense. Can use Single Value Decomposition (SVD) also. p14 RJM 12/9/05

MATLAB Session Using the network given earlier, and the XOR test set >> for ct = 1:4, [a,h(ct,:)] = rbf_calc(rbfn, xor(ct,:)); end, h h = 1.0000 0.1353 1.0000 1.0000 0.3679 0.3679 1.0000 0.3679 0.3679 1.0000 1.0000 0.1353 >> rbfn.weights = h\[0;1;1;0] % to find weights ans = 2.8413-2.5027-2.5027 % more accurate than values quoted earlier p15 RJM 12/9/05

Example: Demand = f(temp, Illum) This is same problem as used for MLPs. Have Training Set and Unseen Set cluster training data For simplicity, radii set of circles which encircle all of cluster Set RBF network based on cluster centres, with zero weights Compute the output of the basis functions Use these and training values of Demand to compute weights Find Av. Mean Square Error (MSE) of Demand and RBF o/p Input Unseen Data to trained RBF and compute MSE Training error is 1.18, Unseen error 1.55 not bad Better than MLP, though more work needed on MLP. On next sheet, see tadpole plots. p16 RJM 12/9/05

Errors on Trained and Unseen Data 30 20 0 50 100 150 200 250 30 20 0 5 10 15 20 25 p17 RJM 12/9/05

Comparison with RBFs and MLPs Differences: Feature MLP RBF Variables: Weights and Centres and Thresholds Coefficients Speed of Training: Slow Fast(er) Comp. expensive Less computation Accuracy: Very good Not as good Math Description: Difficult Easy Similarities: Learn from a set of training data. Ability to generalise from training data. Store information in a distributed manner. Can be implemented using parallel processing. p18 RJM 12/9/05