A Learning Algorithm for Piecewise Linear Regression

Similar documents
A New Learning Method for Piecewise Linear Regression

Training Digital Circuits with Hamming Clustering

A Clustering Technique for the Identification of Piecewise Affine Systems

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

Figure (5) Kohonen Self-Organized Map

PATTERN CLASSIFICATION AND SCENE ANALYSIS

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 6, NOVEMBER Inverting Feedforward Neural Networks Using Linear and Nonlinear Programming

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Learning to Learn: additional notes

Assignment 2. Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions

3 Nonlinear Regression

An EM-based Ensemble Learning Algorithm on Piecewise Surface Regression Problem

CHAPTER IX Radial Basis Function Networks

4. Feedforward neural networks. 4.1 Feedforward neural network structure

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)

IMPLEMENTATION OF RBF TYPE NETWORKS BY SIGMOIDAL FEEDFORWARD NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Artificial Neural Networks Lecture Notes Part 5. Stephen Lucci, PhD. Part 5

This leads to our algorithm which is outlined in Section III, along with a tabular summary of it's performance on several benchmarks. The last section

Data mining with Support Vector Machine

Lecture 3: Linear Classification

Supervised vs unsupervised clustering

Pattern Classification Algorithms for Face Recognition

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Linear Models. Lecture Outline: Numeric Prediction: Linear Regression. Linear Classification. The Perceptron. Support Vector Machines

3 Nonlinear Regression

Visual object classification by sparse convolutional neural networks

Classifier C-Net. 2D Projected Images of 3D Objects. 2D Projected Images of 3D Objects. Model I. Model II

Operators-Based on Second Derivative double derivative Laplacian operator Laplacian Operator Laplacian Of Gaussian (LOG) Operator LOG

Optimization Methods for Machine Learning (OMML)

Linear Machine: A Novel Approach to Point Location Problem

5 Learning hypothesis classes (16 points)

Gene selection through Switched Neural Networks

Lecture on Modeling Tools for Clustering & Regression

6. Dicretization methods 6.1 The purpose of discretization

Support Vector Machines

Identification of the correct hard-scatter vertex at the Large Hadron Collider

Model learning for robot control: a survey

Pattern recognition methods for electronic tongue systems. Patrycja Ciosek

CS6220: DATA MINING TECHNIQUES

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Chapter 15 Introduction to Linear Programming

Hybrid PSO-SA algorithm for training a Neural Network for Classification

Unsupervised learning in Vision


Introduction to machine learning, pattern recognition and statistical data modelling Coryn Bailer-Jones

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:

Bumptrees for Efficient Function, Constraint, and Classification Learning

9.1. K-means Clustering

Learning and Generalization in Single Layer Perceptrons

Machine Learning in Biology

Machine Learning Classifiers and Boosting

CS 229 Midterm Review

Processing Missing Values with Self-Organized Maps

Interpolation is a basic tool used extensively in tasks such as zooming, shrinking, rotating, and geometric corrections.

Function approximation using RBF network. 10 basis functions and 25 data points.

Unsupervised Learning : Clustering

Use of Multi-category Proximal SVM for Data Set Reduction

The Detection of Faces in Color Images: EE368 Project Report

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

Local Linear Approximation for Kernel Methods: The Railway Kernel

6. NEURAL NETWORK BASED PATH PLANNING ALGORITHM 6.1 INTRODUCTION

Distribution-free Predictive Approaches

SPARSE COMPONENT ANALYSIS FOR BLIND SOURCE SEPARATION WITH LESS SENSORS THAN SOURCES. Yuanqing Li, Andrzej Cichocki and Shun-ichi Amari

Segmentation of Images

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization

Practice EXAM: SPRING 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

Machine Learning Lecture 3

Radial Basis Function Networks: Algorithms

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013

Machine Learning Lecture 3

Support Vector Machines

Time Series Prediction as a Problem of Missing Values: Application to ESTSP2007 and NN3 Competition Benchmarks

Topics in Machine Learning

Introduction to ANSYS DesignXplorer

Global Journal of Engineering Science and Research Management

Robust Signal-Structure Reconstruction

We use non-bold capital letters for all random variables in these notes, whether they are scalar-, vector-, matrix-, or whatever-valued.

MATLAB representation of neural network Outline Neural network with single-layer of neurons. Neural network with multiple-layer of neurons.

Example Lecture 12: The Stiffness Method Prismatic Beams. Consider again the two span beam previously discussed and determine

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Character Recognition Using Convolutional Neural Networks

Applying Supervised Learning

The Facet-to-Facet Property of Solutions to Convex Parametric Quadratic Programs and a new Exploration Strategy

Supervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples.

Semi-Supervised Clustering with Partial Background Information

Recap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach

Convex Optimization - Chapter 1-2. Xiangru Lian August 28, 2015

Leave-One-Out Support Vector Machines

Neural Network based textural labeling of images in multimedia applications

Clustering: Classic Methods and Modern Views

Unit V. Neural Fuzzy System

Machine Learning (CSE 446): Perceptron

V4 Matrix algorithms and graph partitioning

Classification and Regression using Linear Networks, Multilayer Perceptrons and Radial Basis Functions

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

Lecture notes on the simplex method September We will present an algorithm to solve linear programs of the form. maximize.

Extract an Essential Skeleton of a Character as a Graph from a Character Image

Transcription:

A Learning Algorithm for Piecewise Linear Regression Giancarlo Ferrari-Trecate 1, arco uselli 2, Diego Liberati 3, anfred orari 1 1 nstitute für Automatik, ETHZ - ETL CH 8092 Zürich, Switzerland 2 stituto per i Circuiti Elettronici - CNR via De arini, 6-16149 Genova, taly 3 Ce.S.T..A. - CNR c/o Politecnico di ilano Piazza Leonardo da Vinci, 32-20133 ilano, taly Abstract A new learning algorithm for solving piecewise linear regression problems is proposed. t is able to train a proper multilayer feedforward neural network so as to reconstruct a target function assuming a different linear behavior on each set of a polyhedral partition of the input domain. The proposed method combine local estimation, clustering in weight space, classification and regression in order to achieve the desired result. A simulation on a benchmark problem shows the good properties of this new learning algorithm. 1 ntroduction Real-world problems to be solved by artificial neural networks are normally subdivided in two groups according to the range of values assumed by the output. f it is Boolean or nominal, we speak of classification problems; otherwise, when the output is coded by a continuous variable, we are facing with a regression problem. n most cases, the techniques employed to train a connectionist model depend on the kind of problem we are dealing with. However, applications can be found, which lie on the borderline between classification and regression; these occur when the input space can be subdivided into disjoint regions X i characterized by different behaviors of the function f to be reconstructed. The target of the learning problem is consequently twofold: by analyzing a set of samples of f, possibly affected by noise, it has to generate both the collection of regions X i and the behavior of the unknown function f in each of them. f the region X i corresponding to each sample in the training set were known, we could add the index i of the region as an output, thus obtaining a classification problem which has the target of finding the effective form of each X i. On the other side, if the actual partition X i were known, we could solve several regression problems to find the behavior of the function f within each X i.

Because of this mixed nature, classical techniques for neural network training cannot be directly applied, but specific methods are necessary to deal with this kind of problems. Perhaps, the simplest situation one can think of is piecewise linear regression: in this case the regions X i are polyhedra and the behavior of the function f in each X i can be modeled by a linear expression. Several authors have treated this kind of problem [2, 3, 4, 8], providing algorithms for reaching the desired result. Unfortunately, most of them are difficult to extend beyond two dimensions [2], whereas others consider only local approximations [3, 4], thus missing the effective extension of regions X i. n this contribution a new training algorithm for neural networks solving piecewise linear regression problems is proposed. t combines clustering and supervised learning to obtain the correct values for the weights of a proper multilayer feedforward architecture. 2 The piecewise linear regression problem Let X be a polyhedron in the n-dimensional space R n and X i, i = 1,..., s, a polyhedral partition of X, i.e. X i X j = for every i, j = 1,..., s and s i=1 X i = X. The target of a Piecewise Linear Regression (PLR) problem is to reconstruct an unknown function f : X R having a linear behavior in each region X i f(x) = z i = w i0 + n w ij x j j=1 when only a training set S containing m samples (x k, y k ), k = 1,..., m, is available. The output y k gives an evaluation of f(x k ) subject to noise, being x k X; the region X i to which x k belongs is not known in advance. Scalars w i0, w i1,..., w in, for i = 1,..., s, characterize univocally the function f and their estimate is a target of the PLR problem; for notational purposes they will be included in a vector w i. Since regions X i are polyhedral, they can be defined by a set of l i linear inequalities of the following kind: a ij0 + n a ijk x k 0 (1) k=1 Scalar a ijk, for j = 1,..., l i and k = 0, 1,..., n, can be included in a matrix A i, whose estimate is still a target of the reconstruction process for every i = 1,..., s. Discontinuities may be present in the function f at the boundaries between two regions X i. Following the general idea presented in [8], a neural network realizing a piecewise linear function f of this kind can be modeled as in Fig. 1. t contains a gate layer that verifies inequalities (1) and decides which of the terms z i must be used as the output y of the whole network. Thus, the i-th unit in the gate

N 5 O K J F K J = O A H ) ) ) 5 5 5 / = J A = O A H 0 E @ @ A = O A H N N 1 F K J = O A H Figure 1: General neural network realizing a piecewise linear function. layer has output equal to its input z i, if all the constraints (1) are satisfied for j = 1,..., l i, and equal to 0 in the opposite case. All the other units perform a weighted sum of their inputs; the weights of the output neuron, having no bias, are always set to 1. 3 The proposed learning algorithm As previously noted, the solution of a PLR problem requires a technique that combine classification and regression: the first has the aim of finding matrices A i to be inserted in the gate layer of the neural network (Fig. 1), whereas the latter provides weight vectors w i for the input to hidden layer connections. A method of this kind is reported in Fig. 2; it is composed of four steps, each of which is devoted to a specific task. The first of them (Step 1) has the aim of obtaining a first estimate of the weight vectors w i by performing local linear regressions based on small subsets of the whole training set S. n fact, points x k that are close to each other are likely to belong to the same region X i. Then, for each sample (x k, y k ), with k = 1,..., m, we build a set C k containing (x k, y k ) and the c 1 distinct pairs (x, y) S that score the lowest values of the distance x k x. The parameter c can be freely chosen, though the inequality c n must be respected to perform the linear regression. t can be easily seen that some sets C k, called mixed, will contain input patterns belonging to different regions X i. They lead to wrong estimates for w i and consequently their number must be kept minimum; this can be obtained by lowering the value of c. However,

ALGORTH FOR PECEWSE LNEAR REGRESSON 1. (Local regression) For every k = 1,..., m do 1a. Form the set C k containing the pair (x k, y k ) and the samples (x, y) S associated with the c 1 nearest neighbors x to x k. 1b. Perform a linear regression to obtain the weight vector v k of a linear unit fitting the samples in C k. 2. (Clustering) Perform a clustering process in the space R n+1 to subdivide the set of weight vectors v k into s groups V i. 3. (Classification) Build a new training set S containing the m pairs (x k, i k ), being V ik the cluster including v k. Train a multicategory classification method to produce the matrices A i for the regions X i. 4. (Regression) For every i = 1,..., s perform a linear regression on the samples (x, y) S with x X i to obtain the weight vector w i for the i-th unit in the hidden layer. Figure 2: Proposed learning method for piecewise linear regression. the quality of the estimate improves when the size c of the sets C k increases; a tradeoff must therefore be attained in selecting a reasonable value for c. Denote with v k the weight vector of the linear unit produced through the linear regression on the samples in C k. f the generation of the samples in the training set is not affected by noise, most of the v k coincide with the desired weight vectors w i. Only mixed sets C k yield spurious vectors v k, which can be considered as outliers. Nevertheless, even in presence of noise, a clustering algorithm (Step 2) can be used to determine the sets V i of vectors v k associated with the same w i. A proper version of the K-means algorithm [6] can be adopted to this aim if the number s of regions is fixed beforehand; otherwise, adaptive techniques, such as the Growing Neural Gas [7], can be employed to find at the same time the value of s. The sets V i generated by the clustering process induce a classification on the input patterns x k belonging to the training set S. As a matter of fact, if v k V i for a given i, the set C k is fitted by the linear neuron with weight vector w i and consequently x k is located into the region X i. The effective extension of this region can be determined by solving a linear multicategory classification problem (Step 3), whose training set S is built by adding as output to each input pattern x k the index i k of the set V ik to which the corresponding vector v k belongs. To avoid the presence of multiply classified points or of unclassified patterns in the input space, proper techniques [1] based on linear and quadratic programming can be employed. n this way the s matrices A i for the gate layer are generated; they can include redundant rows that are not necessary in the determination of the polyhedral regions X i. These rows can be removed by applying standard linear programming techniques.

4 3 2 1 0 1 2 3 4 16 16 14 14 12 12 10 10 8 8 y y 6 6 4 4 2 2 0 0 2 x 2 4 3 2 1 0 1 2 3 4 a) b) x Figure 3: Simulation results for a benchmark problem: a) unknown piecewise linear function f and training set S, b) function realized by the trained neural network (dashed line). Finally, weight vectors w i for the neural network in Fig. 1 can be directly obtained by solve s linear regression problems (Step 4) having as training sets the samples (x, y) S with x X i, where X 1,... X s are the regions built by the classification process. 4 Simulation results The proposed algorithm for piecewise linear regression has been tested on a one-dimensional benchmark problem, in order to analyze the quality of the resulting neural network. The unknown function to be reconstructed is the following x if 4 x 0 f(x) = x if 0 < x < 2 (2) 2 + 3x if 2 x 4 with X = [ 4, 4] and s = 3. A training set S containing m = 100 samples (x, y) has been generated, where y = f(x)+ε and ε is a normal random variable with zero mean and variance σ 2 = 0.05. The behavior of f(x) together with the elements of S are depicted in Fig. 3a. The method described in Fig. 2 has been applied by choosing at Step 1 the value c = 6. At Step 2 the number s of regions has been supposed to be known, thus allowing the application of the K-means clustering algorithm [5]; a proper definition of norm has been employed to improve the convergence of the clustering process [6]. ulticategory classification (Step 3) has then been performed by using the method described in [1], which can be easily extended to realize nonlinear boundaries among the X i when treating a multidimensional

problem. Finally, least square estimation is adopted to generate vectors w i for piecewise linear regression. The resulting neural network realizes the following function, represented as a dashed line in Fig. 3b: f(x) = 0.0043 0.9787x if 4 x 0.24 0.0899 + 0.9597x if 0.24 < x < 2.12 1.8208 + 3.0608x if 2.12 x 4 As one can note, this is a good approximation to the unknown function (2). Errors can only be detected at the boundaries between two adjacent regions X i ; they are mainly due to the effect of mixed sets C k on the classification process. References [1] E. J. Bredensteiner and K. P. Bennett, ulticategory classification by support vector machines. Computational Optimizations and Applications, 12 (1999) 53 79. [2] V. Cherkassky and H. Lari-Najafi, Constrained topological mapping for nonparametric regression analysis. Neural Networks, 4 (1991) 27 40. [3] C.-H. Choi and J. Y. Choi, Constructive neural networks with piecewise interpolation capabilities for function approximation. EEE Transactions on Neural Networks, 5 (1994) 936 944. [4] J. Y. Choi and J. A. Farrell, Nonlinear adaptive control using networks of piecewise linear approximators. EEE Transactions on Neural Networks, 11 (2000) 390 401. [5] R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis. (1973) New York: John Wiley and Sons. [6] G. Ferrari-Trecate,. uselli, D. Liberati, and. orari, A Clustering Technique for the dentification of Piecewise Affine Systems. Accepted at the Fourth nternational Workshop on Hybrid Systems: Computation and Control, Roma, taly, arch 28-30, 2001. [7] B. Fritzke, A growing neural gas network learns topologies. n Advances in Neural nformation Processing Systems 7 (1995) Cambridge, A: T Press, 625 632. [8] K. Nakayama, A. Hirano, and A. Kanbe, A structure trainable neural network with embedded gating units and its learning algorithm. n Proceedings of the nternational Joint Conference on Neural Networks (2000) Como, taly, 253 258.