A Class of Instantaneously Trained Neural Networks

Similar documents
Unary Coding for Neural Network Learning

Spread Unary Coding. Subhash Kak 1

Instantaneously trained neural networks with complex inputs

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Local Linear Approximation for Kernel Methods: The Railway Kernel

Machine Learning Classifiers and Boosting

Instantaneously trained neural networks with complex and quaternion inputs

IMPLEMENTATION OF RBF TYPE NETWORKS BY SIGMOIDAL FEEDFORWARD NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

2 Instantaneously Trained Neural Networks with Complex Inputs

CS 4510/9010 Applied Machine Learning. Neural Nets. Paula Matuszek Fall copyright Paula Matuszek 2016

Ensemble methods in machine learning. Example. Neural networks. Neural networks

Text classification II CE-324: Modern Information Retrieval Sharif University of Technology

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

Data Mining. Neural Networks

Applying Supervised Learning

Function approximation using RBF network. 10 basis functions and 25 data points.

ECG782: Multidimensional Digital Signal Processing

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

Data Mining. Kohonen Networks. Data Mining Course: Sharif University of Technology 1

SELECTION OF A MULTIVARIATE CALIBRATION METHOD

4. Feedforward neural networks. 4.1 Feedforward neural network structure

CS6220: DATA MINING TECHNIQUES

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

CS 188: Artificial Intelligence Fall 2008

Figure (5) Kohonen Self-Organized Map

Introduction to Machine Learning. Xiaojin Zhu

Classification Algorithms in Data Mining

COMPUTATIONAL INTELLIGENCE

Neural Network Classifier for Isolated Character Recognition

Neural Networks Laboratory EE 329 A

Neural Network Weight Selection Using Genetic Algorithms

CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES

Texture Classification by Combining Local Binary Pattern Features and a Self-Organizing Map

Opening the Black Box Data Driven Visualizaion of Neural N

A Massively-Parallel SIMD Processor for Neural Network and Machine Vision Applications

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

Machine Learning and Pervasive Computing

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa

CS 343: Artificial Intelligence

Nearest Neighbors Classifiers

Machine Learning 13. week

Artificial Neuron Modelling Based on Wave Shape

An Algorithm For Training Multilayer Perceptron (MLP) For Image Reconstruction Using Neural Network Without Overfitting.

Basis Functions. Volker Tresp Summer 2017

Lecture #11: The Perceptron

Kernels and Clustering

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.

CSE 573: Artificial Intelligence Autumn 2010

A Taxonomy of Semi-Supervised Learning Algorithms

Nearest Neighbor Classification

COMBINING NEURAL NETWORKS FOR SKIN DETECTION

Neural Networks (pp )

CS6716 Pattern Recognition

We use non-bold capital letters for all random variables in these notes, whether they are scalar-, vector-, matrix-, or whatever-valued.

Nearest Neighbor Classification. Machine Learning Fall 2017

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

Simulation of Zhang Suen Algorithm using Feed- Forward Neural Networks

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

CMPT 882 Week 3 Summary

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Research on the New Image De-Noising Methodology Based on Neural Network and HMM-Hidden Markov Models

Naïve Bayes for text classification

2. Neural network basics

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

What to come. There will be a few more topics we will cover on supervised learning

Tree-based methods for classification and regression

A noninformative Bayesian approach to small area estimation

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Model Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer

Robustness of Selective Desensitization Perceptron Against Irrelevant and Partially Relevant Features in Pattern Classification

DEVELOPMENT OF NEURAL NETWORK TRAINING METHODOLOGY FOR MODELING NONLINEAR SYSTEMS WITH APPLICATION TO THE PREDICTION OF THE REFRACTIVE INDEX

11/14/2010 Intelligent Systems and Soft Computing 1

Comparison of various classification models for making financial decisions

A Fuzzy C-means Clustering Algorithm Based on Pseudo-nearest-neighbor Intervals for Incomplete Data

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

CS7267 MACHINE LEARNING NEAREST NEIGHBOR ALGORITHM. Mingon Kang, PhD Computer Science, Kennesaw State University

Neural Networks and Deep Learning

Topics in Machine Learning

CS 4510/9010 Applied Machine Learning

Lecture 3: Linear Classification

Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers

Pattern Classification Algorithms for Face Recognition

Unsupervised Learning: Clustering

LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS

Distribution-free Predictive Approaches

Classification: Feature Vectors

Logical Rhythm - Class 3. August 27, 2018

Statistical Learning Part 2 Nonparametric Learning: The Main Ideas. R. Moeller Hamburg University of Technology

Multi-label classification using rule-based classifier systems

Argha Roy* Dept. of CSE Netaji Subhash Engg. College West Bengal, India.

CLASSIFICATION OF SURFACES

1. Introduction. 2. Motivation and Problem Definition. Volume 8 Issue 2, February Susmita Mohapatra

Supervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples.

AM 221: Advanced Optimization Spring 2016

Lecture notes. Com Page 1

11/14/2010 Intelligent Systems and Soft Computing 1

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

MODULE 7 Nearest Neighbour Classifier and its Variants LESSON 12

Transcription:

A Class of Instantaneously Trained Neural Networks Subhash Kak Department of Electrical & Computer Engineering, Louisiana State University, Baton Rouge, LA 70803-5901 May 7, 2002 Abstract This paper presents FC networks that are instantaneously trained neural networks that allow rapid learning of non-binary data. These networks, which generalize the earlier CC networks, have been compared against Backpropagation (BP) and Radial Basis Function (RBF) networks and are seen to have excellent performance for prediction of time-series and pattern recognition. The networks can generalize using soft or hard decisions. 1 Introduction The popular Backpropagation (BP) and Radial Basis Function (RBF) neural networks require iterative training. BP networks sometimes don t converge or take too long to train to be useful in real-time applications. BP networks have been proposed as models of biological memory, but given the time it takes these networks to learn, it is clear they could never learn short-term, instantaneously-learned, memory. E-mail address: kak@ece.lsu.edu 1

Short-term and long-term memories form a complementary pair in biological functioning. Similarly, quick learning using artificial neural networks should be useful for many engineering applications. It was the motivation to model short-term memory that led to the development of the corner classification (CC) family of networks [1,2] that learn instantaneously and have very good generalization performance. Further work was motivated by an invitation from Karl Pribram to speak on different languages of the brain which led me to analyze the associative apects of memory at some length [3]. Further development of these ideas followed in due course [4,5,7]. These networks learn example patterns and then generalize points that lie in a sphere of fixed range around them, without a specific strategy for other points that lie outside the sphere. It has been shown that these networks are hardware friendly and suited for implementation in reconfigurable computing using fine grained parallelism [9]. Although the CC networks have found many applications in engineering and business [5], they suffer from the disadvantage that their input and output must be discrete. Another disadvantage of the CC networks is that the input is best presented in a unary code, which increases the number of input neurons considerably. Furthermore, the degree of generalization achieved at each trained node is, in the non-adaptive version of the CC network, kept constant. In reality, the data might require that the amount of generalization vary from node to node. These characteristics somewhat limit the applicability of CC networks. In the present paper we describe a generalization of the CC network that we call the FC network [6]. This network can operate on real data directly and offer great flexibility as compared to the CC networks. The FC networks are described in detail elsewhere [6,8]. Here our objective is to focus on their fundamental structure to highlight how they are distinct from other approaches to neural network design and learning. 2 The Network The FC network maps data to the nearest neighbor within whose sphere of generalization the data falls, and if that is not the case it performs an interpolation based on k nearest neighbors. As k becomes large, the FC generalization can be shown to approach the Bayes performance [6]. It uses 2

the fully-connected feedforward network architecture consisting of a layer of input nodes, a layer of hidden neurons, a rule base, followed by an output layer (Figure 1). The number of output neurons is determined by the problem specifications. A network with multiple output neurons can always be separated into multiple single-output networks. An instantaneously trained neural network must, by definition, convert the incoming pattern into corresponding weights without intensive computations. This rules out any technique that is based on reduced-dimension representation, and the network must depend on a linear mapping of the data. It is this linearity requirement that compels the use of unary mapping in CC networks. However, the transformation of the data within the network cannot be entirely linear because that would make it impossible for the networks to generalize. The networks have a nonlinear generalizing region and elsewhere they perform nonlinear interpolation mapping. Making a transition from binary to non-binary networks, we find that abandoning the inefficient unary mapping of data we are able to use real values as they stand. Surprisingly, this generalization allows us to cut down on the size of the network. The price that is paid is in terms of the extra computations necessary to separate patterns in the signal space. We want the network to work in two steps: 1. A basic architecture based on the exemplar patterns 2. A tuning of the network to make the patterns consistent The CC networks merely learn the exemplar patterns without a systematic procedure for tuning of the generalization process. In the FC networks, input data are normalized to the range [0, 1] and presented to the network in the form of an R-element long continuous-valued vector x =(x 1,x 2,..., x R ), where R is determined by the problem specification. Like the CC network, the number of hidden neurons S in an FC network is equal to the number of training samples the network is required to learn. Each hidden neuron i (i =1, 2,..., S) is associated with a weight vector w i =(w i,1,w i,2,..., w i,r ), where w i,j is the weight from source x j in the vector to the ith hidden neuron. 3

x 1 F x 2 F RULE y BASE x R F Figure 1: An FC Network. The square boxes, F, adjust the generalization radii Each hidden neuron i first computes the normalized Euclidean distance d i between its weight vector w i and the test vector x. This distance together with r i, the radius of generalization for this hidden neuron, constitute the inputs to the activation function F which determines the hidden neuron output h i. The output of all hidden neurons taken together forms the distance vector h =(h 1,h 2,...h S ) that gives a measure of similarity between the test vector and each of the training vectors the network has already learned. The rule base enables the FC network to generalize. It consists of a set of IF-THEN rules that operates on the distance vector h to produce a membership grade vector µ =(µ 1,µ 2..., µ S ) that indicates to what degree the test vector x belongs to each of the network output classes. The output neuron then computes the dot product between output weight vector u =(u 1,u 2,..., u S ) and the membership vector µ to produce the generalized network output y corresponding to test vector x. Training of an FC network involves two distinct steps. The first step determines the input and output weights while the second step finds the radius of generalization for each hidden neuron. Unlike the neurons in a CC4 network, the input vector x, theweight 4

vector w, as well as the scalar output h in the FC neuron are all continuousvalued quantities. Quantity d is the Euclidean distance between x and w. The parameter r is called the radius of generalization, a term borrowed from the CC4 network. Its purpose is to allocate to this hidden neuron, an area of generalization in the input space with radius r and its center at w. Unlike the CC4 network where the same r applies to the whole network, the radius of generalization in an FC network is individually determined for each hidden neuron during training. The network can be trained with just two passes of the training samples. The first pass assigns the synaptic weights for the input and output layers. The second pass determines the radius of generalization r for each training sample. The notion of radius of generalization for each training vector provides a basis for the network to switch between a 1NN classifier and a knn classifier during generalization. The network behaves like a 1NN classifier when the input vector falls within the area of generalization of a training vector, and a knn classifier otherwise. This enables the network to benefit from the strength of both classifiers. The network can generalize in a variety of ways. In earlier work [6,8], we have considered soft generalization which makes it possible to interpolate outside of the generalization spheres. This is shown in Fig 2(a) for three learned examples labeled 1, 2, and 3. The radius of these spheres is exactly one-half of the distance to the nearest other exemplar pattern. But the entire space can be divided into specific classes, using hard decision boundaries, as in Figure 2(b). As to which of the two methods to use would depend on the nature of the problem and the geometry of the output values. Example Consider the following input-output training samples: Sample Input Output 1 2 3 4-1 7 2 1-1 2-3 4 3 3 0 1 8 9 The Euclidean distances between the samples are: d 12 =5,d 13 =10,d 23 = 11.27. The radius of generalization for a node is one-half the least distance to the other samples. Therefore, the radii to be associated with the three hidden nodes are 2.5, 2.5, 5, respectively. 5

3 3 1 2 1 2 (a) (b) Figure 2: (a) Soft generalization together with interpolation, (b) hard generalization with separated decision regions 6

Now, consider an input vector X =(1, 2, 3, 4), whose output we wish to compute. This vector is at the Euclidean distance of d 1 =5.29, d 2 = 7.68, and d 3 =5.29 from the three stored vectors. Since these distances are larger than the radii of generalization of the three vectors, we compute the 1/d i 1/d 1 +1/d 2 +1/d 3 interpolation weights µ i = This gives us µ 1 =0.372, µ 2 =0.256, µ 3 =0.372. So, the output is: 0.372 7+0.256 4+0.372 9=6.976. This is an example of the use of soft generalization. The FC network possesses generalization characteristics that compare favorably with other neural networks such as the BP and RBF networks [6,8]. The design of an FC network for any real-world problem is very easy compared to BP and RBF networks. Therefore, it offers an attractive alternative to the other networks. 3 Concluding Remarks The paper has presented the outline of a method for instantaneous learning of patterns by a neural network that works both for binary and non-binary data. This method is a direct generalization of the CC family of networks where the actual distances between the exemplars are computing to determine the radius of generalization to be associated with each hidden node. The generalization strategy must depend on the nature of the data. Further research on soft generalization options related to problem geometry needs to be carried out since the earlier work [6,8] has only considered a specific method. The method described in this paper has the interesting property that the tuning of the generalization procedure is carried out later. The tuning could include reduction in the dimensions of the problem space, which are not described in this paper. It can also be enhanced by means of node pruning techniques. Perhaps, it is tuning of this kind that transforms a short-term memory into a long-term one. 7

Acknowledgement This paper is a summary of the plenary lecture delivered at the JCIS 2002 in Durham, North Carolina, in March 2002. References [1] S. Kak, On training feedforward neural networks. Pramana -J. of Physics 40 (1993) 35-42. [2] S. Kak, New algorithms for training feedforward neural networks. Pattern Recognition Letters 15 (1994) 295-298. [3] S. Kak, The three languages of the brain: quantum, reorganizational, and associative. In Learning as Self-Organization, K. Pribram and J. King (eds.). Lawrence Erlbaum, Mahwah, 1996, pp. 185-219. [4] S. Kak, On generalization by neural networks, Information Sciences 111 (1998) 293-302. [5] S. Kak, Better web searches and prediction with instantaneously trained neural networks, IEEE Intelligent Systems 14(6) (1999) 78-81. [6] K.W. Tang, Instantaneous Learning Neural Networks. Ph.D. Dissertation, LSU, 1999. [7] K.W. Tang and S. Kak, A new corner classification approach to neural network training, Circuits, Systems Signal Processing 17 (1998) 459-469. [8] K.W. Tang and S. Kak, Fast classification networks for signal processing, Circuits, Systems Signal Processing 21 (2002) 207-224. [9] J. Zhu and G. Milne, Implementing Kak neural networks on a reconfigurable computing platform, In FPL 2000, LNCS 1896, R.W. Hartenstein and H. Gruenbacher (eds.), Springer-Verlag, 2000, pp. 260-269. 8