Backpropagation: In Search of Performance Parameters

Similar documents
Learning the Kernel Parameters in Kernel Minimum Distance Classifier

S1 Note. Basis functions.

Cluster Analysis of Electrical Behavior

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

General Vector Machine. Hong Zhao Department of Physics, Xiamen University

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Unsupervised Learning

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

A Binarization Algorithm specialized on Document Images and Photos

Hermite Splines in Lie Groups as Products of Geodesics

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

Simulation Based Analysis of FAST TCP using OMNET++

Smoothing Spline ANOVA for variable screening

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Feature Reduction and Selection

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

Meta-heuristics for Multidimensional Knapsack Problems

Machine Learning 9. week

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

Edge Detection in Noisy Images Using the Support Vector Machines

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Lecture 5: Multilayer Perceptrons

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Support Vector Machines

Classifying Acoustic Transient Signals Using Artificial Intelligence

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines

An Optimal Algorithm for Prufer Codes *

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

Neural Network Control for TCP Network Congestion

A Statistical Model Selection Strategy Applied to Neural Networks

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

An Ensemble Learning algorithm for Blind Signal Separation Problem

Biological Sequence Mining Using Plausible Neural Network and its Application to Exon/intron Boundaries Prediction

CS 534: Computer Vision Model Fitting

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

Comparison Study of Textural Descriptors for Training Neural Network Classifiers

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

Solving two-person zero-sum game by Matlab

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Wishing you all a Total Quality New Year!

The Man-hour Estimation Models & Its Comparison of Interim Products Assembly for Shipbuilding

Bootstrapping Color Constancy

SVM-based Learning for Multiple Model Estimation

Parallelism for Nested Loops with Non-uniform and Flow Dependences

PRÉSENTATIONS DE PROJETS

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

Unsupervised Learning and Clustering

Application of Maximum Entropy Markov Models on the Protein Secondary Structure Predictions

Improving anti-spam filtering, based on Naive Bayesian and neural networks in multi-agent filters

12. Segmentation. Computer Engineering, i Sejong University. Dongil Han

The Comparison of Calibration Method of Binocular Stereo Vision System Ke Zhang a *, Zhao Gao b

On the Virtues of Parameterized Uniform Crossover

A New Token Allocation Algorithm for TCP Traffic in Diffserv Network

GA-Based Learning Algorithms to Identify Fuzzy Rules for Fuzzy Neural Networks

A New Feature of Uniformity of Image Texture Directions Coinciding with the Human Eyes Perception 1

Signature and Lexicon Pruning Techniques

Module Management Tool in Software Development Organizations

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

y and the total sum of

A Notable Swarm Approach to Evolve Neural Network for Classification in Data Mining

Collaboratively Regularized Nearest Points for Set Based Recognition

ESTIMATION OF PROPER PARAMETER VALUES FOR DOCUMENT BINARIZATION

An Efficient Genetic Algorithm with Fuzzy c-means Clustering for Traveling Salesman Problem

Chinese Word Segmentation based on the Improved Particle Swarm Optimization Neural Networks

Research of Neural Network Classifier Based on FCM and PSO for Breast Cancer Classification

Conditional Speculative Decimal Addition*

Fuzzy Filtering Algorithms for Image Processing: Performance Evaluation of Various Approaches

Problem Set 3 Solutions

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

User Authentication Based On Behavioral Mouse Dynamics Biometrics

A Bilinear Model for Sparse Coding

Learning Non-Linearly Separable Boolean Functions With Linear Threshold Unit Trees and Madaline-Style Networks

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Network Intrusion Detection Based on PSO-SVM

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

INTELLECT SENSING OF NEURAL NETWORK THAT TRAINED TO CLASSIFY COMPLEX SIGNALS. Reznik A. Galinskaya A.

An algorithm for correcting mislabeled data

Classifier Selection Based on Data Complexity Measures *

Corner-Based Image Alignment using Pyramid Structure with Gradient Vector Similarity

Optimal connection strategies in one- and two-dimensional associative memory models

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

Performance Evaluation of Information Retrieval Systems

Related-Mode Attacks on CTR Encryption Mode

The Codesign Challenge

Learning-Based Top-N Selection Query Evaluation over Relational Databases

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

Fitting & Matching. Lecture 4 Prof. Bregler. Slides from: S. Lazebnik, S. Seitz, M. Pollefeys, A. Effros.

Reducing Frame Rate for Object Tracking

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

A new segmentation algorithm for medical volume image based on K-means clustering

An Improved Image Segmentation Algorithm Based on the Otsu Method

The Discriminate Analysis and Dimension Reduction Methods of High Dimension

Parallelization of a Series of Extreme Learning Machine Algorithms Based on Spark

Query Clustering Using a Hybrid Query Similarity Measure

Learning a Class-Specific Dictionary for Facial Expression Recognition

Data Mining: Model Evaluation

Transcription:

Bacpropagaton: In Search of Performance Parameters ANIL KUMAR ENUMULAPALLY, LINGGUO BU, and KHOSROW KAIKHAH, Ph.D. Computer Scence Department Texas State Unversty-San Marcos San Marcos, TX-78666 USA ae049@txstate.edu, lb40@txstate.edu, 0@TxState.edu Abstract: - Ths wor s an extensve study of the bacpropagaton networ based on a new vsual tool, Equal Opportunty for Recognton (EOR) for all nputs to be recalled, whch s used to evaluate the overall networ performance, n partcular, ts generalzaton capabltes. The new procedure, EOR, s used as a means to assess the effect of other system parameters. Keywords: bacpropagaton, Networ evaluaton, generalzaton, EOR, Processng Elements (PEs) and parameters. Introducton A Bacpropagaton Networ s a multplayer, assocatve, and feed forward neural networ that features supervsed learnng usng gradent descent tranng procedure. Bac Propagaton s wdely used n applcatons nvolvng pattern recognton because of ts powerful capablty of generalzaton. Whle ts system structure and learnng algorthm are well documented, there exst no mathematcal crtera to assess the performance, partcularly the generalzaton capabltes, of the networ wth respect to such networ parameters as number of PEs on the hdden layer, the mean squared error, learnng rates, ntalzaton of weghts and thresholds. Searchng for a measure of system performance, we proposed a vsual method, the EOR plot, whch can be used as an ndcator of the overall system performance. Wth the ad of EOR plottng, we further studed the varous parameters of the system as they relate to the overall system behavor, ncludng MSE, hdden layer sze, learnng rates, weght and threshold ntalzaton, and threshold updatng. Descrpton of Experments To study the varous propertes of a bacpropagaton networ, we started wth 6 captal letters of the Englsh alphabet, each of whch s represented on a 4 by 4 grd as follows. Each grd was converted to a bnary vector of 576 elements. Fg. : Input patterns Each bnary vector s assocated wth a 8-bt ASCII code correspondng to the Englsh letter. Snce one hdden layer s generally suffcent for most applcatons [4], we have desgned a bacpropagaton networ of three layers, an nput layer wth 576 PEs, an output layer wth 8 PEs, and a hdden layer wth a varyng number of PEs. In the lght of the EOR (Equal Opportunty for Recognton) plot as presented below, we studed all parameters of a bacpropagaton networ based on our mplementaton of the networ on Mathematca 4. 3 EOR Bacpropagaton s a smple and powerful algorthm, yeldng satsfactory results f properly mplemented. Mathematcal crtera, however, are stll to be found that can be employed to evaluate system performance wth respect to such a networ parameters as the MSE, hdden layer sze, ntal weghts, and learnng rates. Many rules for choosng hdden

layer sze have been proposed, however none of them seem to be superor and all are result of some emprcal conjure. To guarantee the applcablty of a networ, however, some measures have to be taen to assess system performance. To avod overtranng, for example, constant montorng on system performance s necessary, ncludng the ncorporaton of test data n the process of tranng Gven a specfc applcaton, such as the recognton of the 6 captal Englsh letters, nose reducton and generalzaton capabltes n the presence of random nose are among essental requrements of the networ. In other words, we need to prove the probablstc performance of a networ so that, frst, all nput patterns can be recovered successfully wth an equal opportunty, and second, the probablty that an ndvdual nput can be recovered should meet the requrements of the applcaton. Both factors are related to all the parameters of a networ. In the absence of a mathematcal descrpton, we propose the EOR plot (Equal Opportunty for Recognton) as a vsual, probablstc method to evaluate system performance. Gven a set of system parameters, ncludng MSE, ntal weghts, thresholds, learnng rates, and hdden layer sze, we tran the networ and estmate the probablty of each ndvdual nput pattern correctly recognzed at a specfed rate of random nose. The latter could be done by repeatng the recall process on a suffcently large number of randomly corrupted nputs and montorng the behavor of the networ. After all ndvdual nputs have been processed, the performance of the networ can be analyzed usng EOR plots. Usng 9 hdden layer neurons wth a range of 00 to +00 for weght and threshold random ntalzaton, a learnng rate of, an MSE of 005, and random nose rates of 0% and 5%, respectvely, we obtan the followng EOR plots as an estmaton of the networ performance. p r o b a bnumber of teratons n hundreds l t y p r o b a bnumber of teratons n hundreds l t y P p 5 0 5 0 5 Number of teratons n hundreds Fg. (a) 5 0 5 0 5 Number of teratons n hundreds Fg. (b) Fg. : EOR plots for 0% & 5% nose respectvely Accordng to the two EOR plots, wth 0% random nose, each nput pattern can be correctly recognzed wth a probablty of over 90% n spte of the slght varatons; wth 5% random nose, all patterns can be recognzed. Fg 3 depcts sample letters wth 0% random nose. Fg. 3: Specmen wth 0% random nose All corrupted patterns can be correctly recovered wth a probablty of more than 90%. As shown by our experments, EOR plots can be used as an objectve descrpton of system performance. EOR plots can be utlzed n analyzng other networ parameters. 4 Results and Analyss 4. Mean Squared Error (MSE) MSE s generally used as an ndcator of networ convergence. However, MSE s not a suffcent factor and other networ characterstcs need to be consdered. Frst, we wll show that MSE s not always a suffcent descrptor of system performance. Usng 8 hdden layer PEs and an MSE of 005 wth a dfferent range for random ntalzaton of weghts and thresholds, we obtaned the followng 5% random nose EOR plots. In fgure

4(a), the range of weght and threshold ntalzaton s -.0 to +.0; n fgure 4(b), the range of weghts and threshold ntalzaton s - 05 to +05. p p 5 0 5 0 5 Fg. 4(a) 5 0 5 0 5 Fg. 4( b) Fg. 4: EOR plot for dfferent weght and threshold ntalzatons Second, gven a specfc topology of a networ, a small MSE does not always yeld better system performance. As shown by our experment, after a certan pont, the EOR plot remans vrtually the same wthout evdence of over-fttng. The followng results are obtaned usng 8 hdden layer nodes, a learnng rate of, and a range of 05 to +05 for weght and threshold random ntalzaton, at an MSE of 5, 05, and 0 8 6 4 5 0 5 Fgure 5(c): M SE of 00 Fg 5: EOR plots for dffer ent MSE values Therefore, whle MSE s an mportant factor of a bacpropagaton networ, t s not suffcent for drawng conclusons about system performance. Other factors, ncludng weght ntalzaton and sze of the hdden layer also play an mportant role. 4. Weght Intalzaton In a three-layer networ, there are two weght sets. As a general rule, the weghts should be randomly ntalzed to small values to avod system oscllaton and as justfed by the dervatve of the actvaton functon. We started wth a range of.0 to +.0 and gradually reduced the range. We observed that smaller random ntalzaton yelds a better performance. For the followng graphs, a networ wth 8 hdden layer nodes used, together wth an MSE of 05, learnng rate of and dfferent ranges for weght and threshold ntalzaton. 0 5 Error Fg. 6(a) weght and threshold ntalzaton to + Fg. 5(a): MSE of 5 Fg. 5(b): MSE of 05 Fg. 6(b) weght and threshold ntalzaton -5 to +5

Fg. 6(c) weght and threshold ntalzaton to + the same MSE yelds smlar system performance regardless of the range of ntalzaton, and thus can be used to compare the effect of number of PEs on the hdden layer. Wth a small number of PEs on the hdden layer, compared to the nput and output layers the learnng curve exhbts a great deal of fluctuatons and does not converge to the specfed MSE. Ths mples the networ does not have enough learnng capacty.e. memory wth 3 hdden layer PEs, we observed the followng results: 5 0 5 0 5 Fg. 6(d) weght and threshold ntalzato n 00 to + 00 5 0 5 0 5 Fg. 6(e) weght and threshold ntalzaton -0000 to +0000 Fg. 6: EOR plots usng varous ranges for weght and threshold random ntalzaton to +, -5 to +5, - to +, -00 to +00, and -0000 to +0000, respectvely. Although the weghts could all be ntalzed to zero, ths would result n a hghly symmetrcal networ and s thus created therefore; t s not a good choce for networ desgn. Ths emphaszes the statement made by Ramelhart et al.[6] Intal weghts of exactly 0 cannot be used, snce symmetres n the envronment are not suffcent to brea symmetres n ntal weghts. p Fg. 7(a) Learnng curve Fg. 7: Learnng curve and EO R plot for a networ wth 3 PEs on hdden layer Wth more PEs on the hdden layer, more nput patterns can be correctly recovered. Once the number of hdden layer PEs reaches an deal range the system performance stablzes and shows very lttle mprovement wth an addton of new PEs. The followng EOR plots are based on 5, 7, 9,, 4, 36, 48, 00 PEs, respectvely. p 5 0 5 0 5 Fg. 7(b) EOR plot 4.3 Number of PEs on the Hdden Layer To study the effect of the number of PEs n the hdden layer on system performance, we performed a seres of experments where all weghts and thresholds were randomly ntalzed between 05 and +05, wth a fxed MSE of 005 and a learnng rate of. When the weghts are ntalzed to very small values, 5 0 5 0 5 Fg. 8(a) E OR plot for 5 hdden layer PEs

p Fg. 8(b) EOR plot for 7 hdden layer PEs p 5 0 5 0 5 Hdden layer PE s are the feature extractors. As the hdden layer sze ncreases, for a fxed error, the number of teratons to tran the networ converges to a value and wll not oscllate. Ths tells us after certan lmt the hdden layer sze does not have any effect on the number of teratons. Although the ncreasng the hdden layer sze brngs down the number of teratons there may not be much mprovement n the total tranng tme. Fg. 8(c) EOR plot for 4 hdden layer PEs p Fg. 8(d) EOR plot for 48 hdden layer PEs p Fg. 8(e) EOR plot for 00 hdden layer PEs Fg. 8: EOR plot for a networ wth 5, 7, 4, 48, 00 PEs on hdden layer, respectvely. We observed that usng a fxed MSE, the number of teratons s related to the number of PEs on the hdden layer by the followng curve. ter 000 750 500 50 000 750 500 50 5 0 5 0 5 5 0 5 0 5 5 0 5 0 5 0 40 60 80 00 Fg. 9: Relatonshp between hdden layer sze and number of tranng teratons 4.4 Learnng Rates: Whle learnng rates are generally taen to be small numbers between 0 and, there s no crteron governng the selecton of a learnng rate. If t s too small, the error correcton s trval and the networ does not learn well, wth lttle chance of gettng out of a local mnmum; f t s too large, the learnng process s one of oscllaton, wth lttle chance of convergence to the necessary MSE. The tranng of a networ s amed at ts generalzaton performance, whch s acheved by system convergence, the speed of whch s adjusted by the learnng rates. To apprecate the effects of large learnng rates, consder the learnng curve of a networ wth 9 hdden layer PEs, a weght and threshold ntalzaton range of 05 to +05, and a learnng rate of 5, as depcted n fg..75.5.5 75 5 5 e 00 00 300 400 500 Fg. 0: Learnng curve at a hgh learnng rate (5). To assess the effect of learnng rate on system performance, we used a networ wth 6 hdden layer PEs, a range of 05 to +05 for weght and threshold ntalzaton, an MSE of 005, and varous learnng rates. Wth a learnng rate of 00 and 0% random nose, the EOR plots are as follows, correspondng to the learnng rate of 0,,, and. Networ dd not converge wth a Learnng rate of.0

Fg. (a) EOR plot at learnng rate of 0 Fg. (b) EOR plot at learnng rate of Fg. (d) EOR plot at learnng rate of Fg. : EOR plots at the learnng rates of 0,, and, respectvely. 4.5 Thresholds Thresholds, or bas, can be used on both the hdden layer and the output layer PEs, to fne-tune the system convergence. Each PE on the hdden and output layer can a threshold value, whch s updated drectly based on the delta value computed for that PE. The threshold updatng not only speeds up system convergence, but also t s potentally helpful n smoothng out system fluctuatons that mght be hard to deal wth usng weght updatng alone. n o = f ( a w θ ), () = Where O s the output of the th node on the hdden or output layer and θ s the correspondng threshold and f s the sgmod functon. If δ s the delta value for the node, θ should be updated as follows: θ ( t) = θ ( t ) εδ () 5 0 5 0 5 5 0 5 0 5 where ε s the threshold learnng rate, δ s the delta value, and θ s the threshold value. 5 Conclusons As there are no formulas that can be readly used to evaluate the performance of a bacpropagaton networ, the Equal Opportunty for Recognton (EOR) plots represent a practcal tool for system assessment wth respect to the applcaton condtons. As a probablstc method, not only can t be used to descrbe system performance, t can also be ncorporated nto the recall process for demandng pattern recogntons. The EOR, has shown a great promse n fndng the optmal ntal condtons for our Neural Networ. The future wor can be n the drecton of fndng general prncples, to desgn a bacpropagaton networ wth near optmal ntal condtons, usng EOR. References: [] Freeman, James A. Smulatng Neural Networs. Addson-Wesley, 994. [] McAuley, Devn. The bacpropagaton networ: learnng by example, 997. [3] Mehrotra, Krshan, et al. Elements of artfcal neural networs, Cambrdge, MIT Press, 997. [4] Sureerattanan, Songyot, et al, New developments on bacpropagaton networ tranng, IEICE Trans., vol. E83-A, No. 6, pp. 03-039, June, 000 [5] Bac Propagaton s Senstve to Intal Condtons (990) -John F. Kolen, Jordan B. Pollac [6] Learnng Representaton by Bac- Propagatng Errors. Nature 33:533-536. D. E. Rumelhart, G. E. Hnton, and R. J. Wllams. 986. [7] Sarle, Warren S. ftp://ftp.sas.com/pub/neural, 00.