CSE 5526: Introduction to Neural Networks Radial Basis Function (RBF) Networks

Similar documents
Learning from Data: Adaptive Basis Functions

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Radial Basis Function Networks: Algorithms

Support Vector Machines

CHAPTER IX Radial Basis Function Networks

A Dendrogram. Bioinformatics (Lec 17)

What is machine learning?

Basis Functions. Volker Tresp Summer 2017

COMPUTATIONAL INTELLIGENCE

Support Vector Machines

Simple Model Selection Cross Validation Regularization Neural Networks

Basis Functions. Volker Tresp Summer 2016

Function approximation using RBF network. 10 basis functions and 25 data points.

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

Introduction to Deep Learning

5 Learning hypothesis classes (16 points)

Concept of Curve Fitting Difference with Interpolation

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

Radial Basis Function Neural Network Classifier

Perceptron as a graph

For Monday. Read chapter 18, sections Homework:

Large Scale Data Analysis Using Deep Learning

Chap.12 Kernel methods [Book, Chap.7]

Introduction to Artificial Intelligence

Linear Models. Lecture Outline: Numeric Prediction: Linear Regression. Linear Classification. The Perceptron. Support Vector Machines

Accurate modeling of SiGe HBT using artificial neural networks: Performance Comparison of the MLP and RBF Networks

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

Training Restricted Boltzmann Machines using Approximations to the Likelihood Gradient. Ali Mirzapour Paper Presentation - Deep Learning March 7 th

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

An Introduction to Cluster Analysis. Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs

CPSC 340: Machine Learning and Data Mining. Regularization Fall 2016

The exam is closed book, closed notes except your one-page cheat sheet.

Neural Networks: What can a network represent. Deep Learning, Fall 2018

Neural Networks: What can a network represent. Deep Learning, Spring 2018

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Using Machine Learning to Optimize Storage Systems

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

4. Feedforward neural networks. 4.1 Feedforward neural network structure

Radial Basis Function Networks

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

CS6716 Pattern Recognition

Machine Learning : Clustering, Self-Organizing Maps

Channel Performance Improvement through FF and RBF Neural Network based Equalization

3 Nonlinear Regression

Neural Networks and Deep Learning

More on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization

Artificial Neural Networks MLP, RBF & GMDH

CHAPTER 6 IMPLEMENTATION OF RADIAL BASIS FUNCTION NEURAL NETWORK FOR STEGANALYSIS

CS 8520: Artificial Intelligence. Machine Learning 2. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek

Application of Support Vector Machine In Bioinformatics

Radial Basis Function (RBF) Neural Networks Based on the Triple Modular Redundancy Technology (TMR)

CPSC 340: Machine Learning and Data Mining. More Regularization Fall 2017

3 Nonlinear Regression

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions

MACHINE LEARNING: CLUSTERING, AND CLASSIFICATION. Steve Tjoa June 25, 2014

Nonparametric Methods Recap

1) Give decision trees to represent the following Boolean functions:

Learning. Learning agents Inductive learning. Neural Networks. Different Learning Scenarios Evaluation

CS 229 Midterm Review

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:

Kernel SVM. Course: Machine Learning MAHDI YAZDIAN-DEHKORDI FALL 2017

Linear methods for supervised learning

Neural Network and SVM classification via Decision Trees, Vector Quantization and Simulated Annealing

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Classification and Regression using Linear Networks, Multilayer Perceptrons and Radial Basis Functions

CS249: ADVANCED DATA MINING

Machine Learning for NLP

CPSC 340: Machine Learning and Data Mining. Kernel Trick Fall 2017

CSE446: Linear Regression. Spring 2017

Some fast and compact neural network solutions for artificial intelligence applications

Clustering Lecture 5: Mixture Model

An Introduction to Machine Learning

Support Vector Machines

Building More Efficient Time Parallel Solvers with Applications to Multiscale Modeling and Gyrokinetics

CSE 158 Lecture 2. Web Mining and Recommender Systems. Supervised learning Regression

Instance-based Learning

Learning with Regularization Networks

FAST NEURAL NETWORK ALGORITHM FOR SOLVING CLASSIFICATION TASKS

Lecture #11: The Perceptron

Classification by Support Vector Machines

Neural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R.

Linear Regression & Gradient Descent

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization

SUPERVISED LEARNING METHODS. Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018

Nelder-Mead Enhanced Extreme Learning Machine

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques.

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CS570: Introduction to Data Mining

Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models

Artificial Intelligence. Programming Styles

Algorithms for Optimal Construction and Training of Radial Basis Function Neural Networks. Philip David Reiner

Announcements. Edge Detection. An Isotropic Gaussian. Filters are templates. Assignment 2 on tracking due this Friday Midterm: Tuesday, May 3.

Multi Layer Perceptron with Back Propagation. User Manual

Neural Networks (Overview) Prof. Richard Zanibbi

Assignment 2. Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions

Module 4. Non-linear machine learning econometrics: Support Vector Machine

Non-linearity and spatial correlation in landslide susceptibility mapping

Neural Network Optimization and Tuning / Spring 2018 / Recitation 3

Transcription:

CSE 5526: Introduction to Neural Networks Radial Basis Function (RBF) Networks Part IV 1

Function approximation MLP is both a pattern classifier and a function approximator As a function approximator, MLP is nonlinear, semiparametric, and universal Part IV 2

Function approximation background Weierstrass theorem: any continuous real function in an interval can be approximated arbitrarily well by a set of polynomials Taylor expansion approximates any differentiable function by polynomials in a neighborhood of a point Fourier series gives a way of approximating any periodic function by a sum of sine's Part IV 3

Linear projection Approximate function f (x) by a linear combination of simpler functions FF xx = ww jj φφ jj (xx) jj If w j s can be chosen so that approximation error is arbitrarily small for any function f (x) over the domain of interest, {φφ jj } has the property of universal approximation, or {φφ jj } is complete Part IV 4

Example bases sinc xx = sin(xx) xx Part IV 5

Example bases (cont.) sine function Part IV 6

Radial basis functions Consider φφ jj xx = exp( 1 2σ 2 xx xx jj 2 ) =G( xx xx jj ) A Gaussian is a local basis function, falling off exponentially from the center 1 G( x-x j ) x j x Part IV 7

Radial basis functions (cont.) Thus approximation by RBF becomes FF(xx)= ww jj G( xx xx jj ) jj Part IV 8

RBF approximation illustration Part IV 9

Remarks Gaussians are universal approximators Part IV 10

Remarks (cont.) Such a radial basis function is called a kernel, a term from statistics As a result, RBF nets are a kind of kernel methods Other RBFs exist, such as multiquadrics (see textbook) Part IV 11

Four questions to answer for RBF nets How to identify Gaussian centers? How to determine Gaussian widths? How to choose weights ww jj s? How to select the number of bases? Part IV 12

Gaussian centers Identify Gaussian centers via unsupervised clustering: K- means algorithm Goal of the K-means algorithm: Divide N input patterns into K clusters so as to minimize the final variance. In other words, partition patterns into K clusters CC jj s to minimize the following cost function KK JJ= xx ii uu jj 2 jj=1 ii CC jj where uu jj = 1 xx CC jj ii CC jj ii is the mean (center) of cluster j Part IV 13

K-means algorithm 1. Choose a set of K cluster centers randomly from the input patterns 2. Assign the N input patterns to the K clusters using the squared Euclidean distance rule: xx is assigned to CC jj if xx uu jj 2 xx uu ii 2 for ii jj Part IV 14

K-means algorithm (cont.) 3. Update cluster centers uu jj = 1 CC jj xx ii ii CC jj 4. If any cluster center changes, go to step 2; otherwise stop Remark: The K-means algorithm always converges, but the global minimum is not assured Part IV 15

Calculating Gaussian widths Once cluster centers are determined, the variance within each cluster can be set to 2 σ jj = 1 CC jj uu jj xx ii 2 ii CC jj Remark: to simplify the RBF net design, different clusters often assume the same Gaussian width: σ= dd max 2KK where d max is the maximum distance between cluster centers Part IV 16

Weight update With the hidden layer decided, weight training can be treated as a linear regression problem, and the LMS algorithm is an efficient way for weight update Note that a bias term needs to be included Part IV 17

Selection of the number of bases: bias-variance dilemma The same problem as that of selecting the size of an MLP for classification The problem of overfitting Example: Consider polynomial curve fitting MM FF(xx)= ww jj xx jj jj=0 for ff(xx) = 0.5 + 0.4 sin(2πxx) by an M-order F Part IV 18

Overfitting: MM FF(xx)= ww jj xx jj jj=0 M = 1 M = 3 M = 10 Part IV 19

Occam s razor The best scientific model is the simplest that is consistent with the data In our case, it translates to the principle that a learning machine should be large enough to approximate the data well, but not larger Occam s razor is a general principle governing supervised learning and generalization Part IV 20

Bias and variance Bias: training error difference between desired output and actual output for a particular training sample Variance: generalization error difference between the learned function from a particular training sample and the function derived from all training samples Example: two extreme cases: zero bias and zero variance Part IV 21

Bias and variance (cont.) The optimal size of a learning machine is thus a compromise between the bias and the variance of a model In other words, a good-sized model is the one where both bias and variance are low For RBF nets, in practice, cross validation can be applied to select the number of bases, where validation error is measured for a series of numbers of bases Part IV 22

XOR problem, again RBF nets can also be applied to pattern classification problems XOR problem revisited Part IV 23

XOR problem (cont.) Part IV 24

XOR problem, again RBF nets can also be applied to pattern classification problems XOR problem revisited Part IV 25

Comparison between RBF and MLP For RBF nets, bases are local, while for MLP, bases are global Generally, more bases are needed than hidden units in MLP Training is more efficient for RBF nets Part IV 26