Identification of Hybrid and Linear Parameter Varying Models via Recursive Piecewise Affine Regression and Discrimination

Size: px
Start display at page:

Download "Identification of Hybrid and Linear Parameter Varying Models via Recursive Piecewise Affine Regression and Discrimination"

Transcription

1 Identification of Hybrid and Linear Parameter Varying Models via Recursive Piecewise Affine Regression and Discrimination Valentina Breschi, Alberto Bemporad and Dario Piga Abstract Piecewise affine (PWA) regression is a supervised learning method which aims at estimating, from a set of training data, a PWA map approximating the relationship between a set of explanatory variables (commonly called regressors) and continuous-valued outputs. PWA maps are described by a finite collection of affine functions, defined over a polyhedral partition of the regressor space. The PWA regression problem amounts to estimating both the affine functions and the polyhedral partition from data. In this paper, we describe a recursive and numerically efficient PWA regression algorithm based on a combined use of (i) clustering and multi-model recursive least-squares techniques and (ii) linear multi-category discrimination. Then, we discuss its application to the identification of multi-input multi-output PWA dynamical models in autoregressive form and to the identification of linear parameter-varying (LPV) models. In the former identification problem, the regressor is defined as a collection of past input and output observations, and local linear time-invariant models are estimated together with the partition of the regressor space. In the latter problem, PWA approximations of the scheduling variable dependent coefficients of an LPV model are estimated from the data, along with a partition of the scheduling variable domain. I. INTRODUCTION One of the most challenging problems in regression learning is choosing a proper structure to describe the underlying relation between input (or regressor) vectors x R nx and continuous-valued target outputs y R n y. On the one hand, too simple models (e.g., linear functions of the independent variable x) might not be flexible enough to describe the input/output relationship, leading to a systematic error when predicting the output for a given input. On the other hand, complex model structures tend to be computationally difficult and to overfit the training data, leading to poor generalization to unseen data. This problem is commonly referred in machine learning and system identification as bias-variance tradeoff [9]. Several methods exist in the literature to deal with the bias-variance tradeoff. A first possibility is to use overparameterized models, described by a large set of a priori fixed nonlinear basis functions, from which only a few might actually be needed for an accurate approximation of the underlying relationship among the variables. Sparse estimation methods like the LASSO [8] are then used to *This work was partially supported by the European Commission under project H00-SPIRE DISIRE - Distributed In-Situ Sensors Integrated into Raw Material and Energy Feedstock ( eu/disire/). The authors are with the IMT Institute of Advanced Studies Lucca, 5500 Lucca, Italy. s: {valentina.breschi, alberto.bemporad, identify a low-complex model, by minimizing the fitting error along with a penalty term aiming at minimizing the complexity of the model. However, the final estimate strongly depends on the chosen basis functions. Motivated by the need of simple and flexible model structures, algorithms for computing PieceWise Affine (PWA) approximations of nonlinear regression functions have been developed. Piecewise affine functions are given by partitioning the regressor space into a finite number of polyhedral regions, and by considering an affine submodel on each polyhedron. Both the partition of the regressor space and the parameters defining each affine submodel are inferred from data. PWA regression algorithms can be also extended to the identification of hybrid and PieceWise Affine autoregressive with exogenous inputs (PWARX) systems by defining the regressor as the collection of past input and output observations (see [5] for a literature overview). Among the algorithms developed for the identification of hybrid and PWARX systems, we mention the bounded-error methods [7], [4], the clustering-based procedures [], [9], [], the mixed integer quadratic programming approach [7], the Bayesian method [0], and the sparse optimization method [3]. All these algorithms are executed in a batch fashion, and thus they are not suitable for an on-line implementation, apart from the method in []. This paper proposes an alternative, numerically efficient algorithm, for the identification of discrete-time multi-input multi-output PWARX models. The same algorithm can be properly extended to the identification of Linear Parameter- Varying (LPV) models. In this case, a PWA approximation of the scheduling variable dependent LPV model coefficients is estimated from the training data, along with a polyhedral partition of the scheduling variable domain. The proposed identification approach relies on the two-stage PWA regression algorithm recently developed by the authors and detailed in [6]. The main idea of the algorithm is to sequentially process the observed regressor/output pairs and assign each pair to the affine submodel that is most compatible to it. At the same time, while doing the assignment of a pair, the parameters of the corresponding submodel are updated via a recursive least-squares technique based on inverse QR decomposition. The second stage starts once all the observations have been classified. At this stage, the regressor domain is partitioned into polyhedral regions through a piecewise linear separator, which is computed by solving a convex unconstrained optimization problem, whose solution can be efficiently computed either off-line through a

2 regularized piecewise-smooth Newton method or recursively through an averaged stochastic gradient descent algorithm. Overall, the proposed identification approach turns out to be computationally very effective for offline learning, and suitable for online learning. The paper is organized as follows. The general PWA regression problem is formulated in Section II. The specific problems of data-driven modeling of discrete-time PWARX models and LPV models are formulated in Section II-B and Section II-C, respectively, extending the contribution in [6] by showing its applicability to system identification problems. Section III summarizes the general PWA regression method, which consists of an algorithm to simultaneously cluster the observed regressors and update the model parameters, and two multi-category discrimination algorithms to compute the polyhedral partition of the regressor domain (more details on these algorithms can be found in [6]). The application of the PWA regression algorithm described in Section III to the identification of PWARX and LPV models is discussed in Section IV through two numerical examples. Concluding remarks and some directions for future research are drawn in Section V. A. Notation Let R n be the set of real vectors of dimension n. Let I {,,..., } be a finite set of integers and denote by I the cardinality of I. Given a vector a R n, let a i denote the i-th entry of a, a I the subvector obtained by collecting the entries a i for all i I, a the Euclidean norm of a, a + a vector whose i-th element is max{a i, 0}. Given two vectors a, b R n, max(a, b) is the vector whose i-th component is max{a i, b i }. Given a matrix A R n m, A denotes the transpose of A, A i the i-th row of A. Given two matrices A and B, A B denotes the Kronecker product between A and B. Let I n be the identity matrix of size n and n be the n-dimensional column vector of ones. II. PROBLEM STATEMENT A. Piecewise affine regression Consider the vector-valued PWA function f : X R ny A [ x ] if x X, f(x) =. () A s [ x ] if x X s, where x R n x, X R n x, s N denotes the number of modes (i.e., the number of affine functions defining f), A i R n y (n x +) are parameter matrices, and the sets X i, i =,..., s are polyhedra defined by the linear inequalities X i. = {x R n x : H i x D i }, () with H i and D i being real-valued matrices, that form a complete polyhedral partition of the space X. The following A collection {X i } s i= is a complete partition of the regressor domain X if s i= X i = X and X i X j =, i j, with X i denoting the interior of X i. regression problem is addressed: Problem : PWA regression Consider the data-generating system: y(k) = f o (x(k)) + e o (k), (3) where e o (k) R n y is a zero-mean additive random noise independent of x(k) and f o : X R ny is an unknown function, possibly discontinuous and not necessarily PWA. The PWA regression problem aims at finding a PWA approximation f of the unknown regression function f o based on a set of N observations {x(k), y(k)} N k=. Estimating a PWA approximation f of the true function f o requires (i) choosing the number of modes s, (ii) computing the parameter matrices {A i } s i= that characterize the PWA submodels defining f, and (iii) finding the polyhedral partition {X i } s i= of the regressor space X where those PWA submodels are defined. In this work, we assume that the number of modes s is fixed by the user. When choosing the value of s, one must take into account the tradeoff between data fitting and model complexity. For small values of s, the PWA map f cannot accurately capture the shape of the nonlinear function f o. On the other hand, increasing the number of modes also increases the degrees of freedom in the description of the PWA map f, which may cause overfitting and poor generalization to unseen data (i.e., the final estimate is sensitive to the noise corrupting the observations), besides increasing the complexity of the estimation procedure and of the resulting PWA model. The value of s can be chosen through cross-calibration based procedures, by testing, for different values of s, the performance of the estimated model on a fresh set of data (i.e., calibration data set) which is not used in the estimation of the parameter matrices {A i } s i= and of polyhedral partition {X i } s i=. The following subsections show how the problems of identifying hybrid dynamical models in PWARX form and Linear Parameter-Varying models can be cast as the PWA regression problem. B. Identification of PWARX models Model () represents a Multi-Input Multi-Output (MIMO) PWARX discrete-time dynamical system if the regressor vector x(k) at the sampling step k is a collection of past input and output observations, i.e., x(k) = [y (k ) y (k ) y (k n a ) u (k ) u (k ) u (k n b )], (4) where u(k) R n u and y(k) R n y denote the observed input and output signals at time k, respectively. C. Identification of LPV models Linear parameter-varying systems can be seen as an extension of Linear Time-Invariant (LTI) systems, as the dynamic relation between input and output signals is linear but it can

3 change over time according to a measurable time-varying signal p(t) R np, the so-called scheduling vector. The identification problem of MIMO LPV systems can be formulated as the PWA regression Problem by adopting the following MIMO LPV-ARX form n a n b y(k)=a 0 (p(k))+ a j (p(k))y(k j)+ a j+na (p(k))u(k j) j= j= where a j (p(k)), j = 0,..., n a + n b, are PWA functions of p(k) defined as: A j (p(k)) if p(k) P, a j (p(k)) =. (5) A j s(p(k)) if p(k) P s, and p(k) P R np is the value of the scheduling vector at time k. By imposing that each entry of matrix A j i (p(k)) depends affinely on the scheduling vector p(k), the LPV identification problem reduces to reconstructing the PWA mapping of the p-dependent coefficient functions {a j (p(k))} n a+n b j=0 over the polyhedral partition {P i } s i= of the scheduling vector set P. In this case, the N-length sequence of observations {x(k), y(k)} N k= requires x(k) = [y (k )... y (k n a ) u (k )... u (k n b )] [ p (k)]. Note that, since the polyhedral partition {P i } s i= is not fixed a priori, the underlying dependencies of the functions a j (p(k)) on the scheduling vector p(k) are directly reconstructed from data. This represents one of the main advantages w.r.t. to widely used parametric LPV identification approaches (see, e.g., [3], [6]), which in turn require to parameterize a j (p(k)) as a linear combination of some basis functions specified a priori (e.g., polynomial functions). III. PWA REGRESSION ALGORITHM In this section, we summarize the two-stage algorithm to solve the general PWA regression problem (namely, Problem ) proposed in [6]. Its application to the identification of PWARX and LPV models is discussed in Section IV through two simulation examples. The proposed PWA regression algorithm consists of two stages: S. Simultaneous clustering of the regressor vectors {x(k)} N k= and estimation of the parameter matrices {A i } s i= describing the PWA map in (). This is performed recursively, by processing the training pairs {x(k), y(k)} sequentially. S. Computation of a polyhedral partition of the regressor space X through a multi-category linear separation method. This stage is performed after stage S, and it can be executed either offline or online (recursively). A. Iterative clustering and parameter estimation Stage S is carried out as described in Algorithm. The algorithm is an extension to the case of multiple linear regressions and clustering of the approach proposed in [] for solving recursive least squares problems using inverse QR decomposition. Algorithm Recursive clustering and parameter estimation algorithm Input: Sequence of observations {x(k), y(k)} N k=, desired number s of modes, noise covariance matrix Λ e ; initial condition for matrices A i, cluster centroids c i, and centroid covariance matrices R i, i =,..., s.. let C i, i =,..., s;. for k =,..., N do [.. let e i (k) y(k) A ] i x(k), i =,..., s;.. let i(k) arg min (x(k) c i) R i (x(k) c i )+ i=,...,s e i (k) Λ e e i (k);.3. let C i(k) C i(k) {x(k)};.4. update the parameter matrices A i(k) using the inverse QR factorization approach of [];.5. let δc i(k) C i(k) (x(k) c i(k) );.6. update the centroid c i(k) of cluster C i(k) (6) c i(k) c i(k) + δc i(k) ; (7).7. update the centroid covariance matrix R i(k) for cluster C i(k) C i(k) R i(k) C i(k) R i(k) + δc i(k) δc i(k) + (8) 3. end for; 4. end. [ ] [ ] C i(k) x(k) ci(k) x(k) ci(k) ; Output: Estimated matrices A,..., A s, clusters C,..., C s. Algorithm requires an initial guess for the parameter matrices A i and cluster centroids c i, i =,..., s. Zero matrices A i, randomly chosen centroids c i, and identity centroid covariance matrices R i are a possible choice. In alternative, if Algorithm can be executed in a batch fashion on a subset of data, one can initialize the parameter matrices A,..., A s all equal to the best linear model A i arg min A N k= y(k) A [ x(k) ], i =,..., s (9) that fits all data, and classify the regressors {x(k)} N k= through k-means clustering [], to compute the cluster centroids and the centroid covariance matrices. Otherwise, the initial guess for the parameters could be obtained executing Algorithm without the first term in (6) and updating the cluster centroids and covariance matrices only once all the data have been classified. Clearly, the final estimate of the parameter matrices A,..., A s and of the clusters C,..., C s depends on their initial values. When working offline on

4 a batch of identification data, estimation quality may be improved by repeating Algorithm iteratively, by using its output as initial condition for its following execution. After computing the estimation error e i (k) for all models i at Step., Step. picks up the best mode i(k) to which the current sample x(k) must be associated with (the reader is referred to as [6] for a stochastic interpretation of the discrimination criterium (6) used to cluster the observed data points). Step.4 updates the parameter matrix A i(k) associated to the selected mode i(k) using the recursive least-squares algorithm of [] based on inverse QR factorization. Note that only the parameters of the matrix A i(k) associated to the selected mode i(k) are updated at time k, while the parameters associated to the other modes are not. Remark : In case the noise covariance matrix Λ e of noise e o cannot be determined a priori, a possible choice for Λ e is Λ e = I ny (i.e., the regression errors are not weighted in (6)). Alternatively, if Algorithm is executed in a batch fashion and iteratively repeated by using the output as initial condition for the next execution, an estimate ˆΛ e of Λ e can be computed at the end of each execution as ˆΛ e = N s N i= k= x(k) C i B. Partitioning the regressor space ( [ y(k) ]) ( [ Ai x(k) y(k) ]) Ai x(k). When one is interested in getting also the partition {X i } s i=, besides the affine models {A i} s i=, the clusters {C i } s i= must be separated via linear multicategory discrimination. The linear multicategory discrimination problem amounts to computing a convex piecewise affine separator function ϕ : R n x R discriminating between the clusters C,..., C s. The piecewise affine separator ϕ is defined as the maximum of s affine functions {ϕ i (x)} s i=, i.e., ϕ(x) = max ϕ i(x). (0) i=,...,s The affine functions ϕ i (x) are described by the parameters ω i R n x and γ i R, namely: [ ] ϕ i (x) = [ x ω ] i. () γ i For i =,..., s, let M i be a m i n x dimensional matrix (with m i denoting the cardinality of cluster C i ) obtained by stacking the regressors x(k) belonging to C i in its rows. According to [8], in case of linearly separable clusters, the piecewise-affine separator ϕ satisfies the conditions: [ Mi mi ] [ ω i γ i ] [ M i mi ] [ ω j γ j ] + mi, () i, j =,..., s, i j. ) Off-line multicategory discrimination: Rather than solving a robust linear programming (RLP) problem as in [8], the parameters {ω i, γ i } s i= are computed by solving the convex unconstrained optimization problem λ s ( min ω i + (γ i ) ) + (3) ξ i= s s ( [ ) [ M i mi ] ω j ω i + mi, i= j = j i m i γ j γ i ] with ξ = [ (ω )... (ω s ) γ... γ s ]. Problem (3) aims at minimizing the averaged squared -norm of the violation of the inequalities in (). The regularization parameter λ > 0 is introduced so that the objective function in (3) is strongly convex. Problem (3) can be efficiently solved via the Regularized Piecewise-Smooth Newton (RPSN) method explained in [6] and originally proposed in [5]. ) Recursive multicategory discrimination: As an alternative to the off-line approach to the multicategory discrimination, or in addition to it for refining the partition ϕ online based on streaming data, a recursive approach to solve problem (3) based on on-line convex programming can be used. Let us treat the data-points x R nx as random vectors and assume that an oracle function i : R n x : {,..., s} exists that, to any x R nx, assigns the corresponding mode i(x) {,..., s}. Function i implicitly defines clusters in the datapoint space R nx. Let us also assume that the following values π i = Prob[i(x) = i] = δ(i, i(x))p(x)dx R nx are known for all i =,..., s, where δ(i, j) = if i = j, zero otherwise. The multicategory discrimination problem (3) can be the generalized to the following convex regularized stochastic optimization problem ξ = min E x R nx [l(x, ξ)] + λ ξ ξ (4) s ( l(x, ξ) = x (ω j ω i(x) ) γ j + γ i(x) + π i(x) j = j i(x) where E x [ ] denotes the expected value w.r.t. x. Problem (4) aims at violating the least, on average over x, the condition in () for i = i(x). The coefficients π i can be a-priori estimated offline from a subset of data, namely π i = m i N, and possibly iteratively updated. Numerical experiments have shown however that constant and uniform coefficients π = s work equally well. The solution of (4), which provides the piecewise affine multicategory discrimination function ϕ, can be computed by online convex optimization, through the Averaged Stochastic Gradient Descent (ASGD) method described in [6]. IV. SIMULATION EXAMPLES In this section, we consider two numerical examples to show the application of the proposed PWA regression approach to the identification of PWARX and LPV systems. + ) +

5 All computations are carried out on an i7.40-ghz Intel core processor with 4 GB of RAM running MATLAB R04b. In both examples, the output used in the training phase is corrupted by an additive zero-mean white noise e o with Gaussian distribution. The effect of measurement noise on the output signal is quantified through the Signal-to-Noise Ratio (SNR), that is defined for the i-th output channel as N k= SNR i = 0 log (y i(k) e o,i (k)) N k= e o,i (k), (5) with e o,i (k) denoting the i-th component of e o (k). The results obtained after the training phase are validated on a noiseless data sequence. Let y o and ŷ denote, respectively, the vectors staking the actual and the simulated outputs of the estimated model, let ȳ o,i be the sample mean of the i-th output, and N V the length of the validation data sequence. The Best Fit Rate (BFR) and Mean Square Error (MSE) indicators { BFR i = max y } o,i ŷ i, 0 (6) y o,i ȳ o,i MSE i = N V N V (y o,i (k) ŷ i (k)) (7) k= defined for each output channel i, i =,..., n y, are used to assess the quality of the estimated models. A. Identification of a multivariable PWARX system Let the system generating the data be a MIMO PWARX system described by the difference equation [ ] y (k) y = [ ] [ ] y (k ) (k) y + [ ] [ ] u (k ) (k ) u (k ) { [ + [ ] + max ] [ ] y (k ) y (k ) [ + [ ] u(k ) u (k ) ] + [ ], [ 0 0 ] } + e o (k), which is characterized by s = 4 operating modes, given by the possible combinations of sign of the components of the first vector argument of the max operator. The excitation input u(k) is a white noise sequence with uniform distribution in the box [ 0] [ ] and length N = 4000, e o (k) R is a zero-mean white noise with covariance matrix Λ e = [ ]. This corresponds to signal-to-noise ratios equal to SNR = 8.7 db and SNR = 6.9 db on the first and second output channels, respectively. We run Algorithm with s = s = 4. The initial guess for the cluster centroids and covariance matrices are computed by running an instance of Algorithm without the first term in (6) (i.e., only the modeling error is used as a discrimination criterion to cluster the observed data points at the first run). Algorithm is then run again for 5 times, with the full criterion (6), by initializing A i, c i and R i with the output of the previous run. The clusters generated by Algorithm are then separated through the off-line multicategory discrimination method described in Section III-B., by solving the convex optimization problem (3) via the RPSN method TABLE I PWARX IDENTIFICATION: BFR AND MSE ON THE TWO OUTPUT CHANNELS BFR BFR MSE MSE 96.% 96.3% TABLE II PWARX IDENTIFICATION: BFR ON THE TWO OUTPUT CHANNELS VS BFR BFR LENGTH N OF THE TRAINING SET. N = 4000 N = 0000 N = (Off-line) RLP [8] 96.0 % 96.5 % 99.0 % (Off-line) RPSN 96. % 96.4 % 98.9 % (On-line) ASGD 86.7 % 95.0 % 96.7 % (Off-line) RLP [8] 96. % 96.9 % 99.0 % (Off-line) RPSN 96.3 % 96.8 % 99.0 % (On-line) ASGD 87.4 % 95. % 96.4 % TABLE III PWARX IDENTIFICATION: CPU TIME REQUIRED TO PARTITION THE REGRESSOR SPACE VS LENGTH N OF THE TRAINING SET. N = 4000 N = 0000 N = (Off-line) RLP [8] s 3.7 s.435 s (Off-line) RPSN 0.06 s s s (On-line) ASGD 0.03 s 0.03 s s described in [6], which is initialized with {ω i, γ i } s i= = 0. The regularization parameter λ is set to 0 5. The quality of the estimated PWA model is assessed w.r.t. a validation set of N V = 500 samples. The true output y o and the open-loop simulated output ŷ generated by the estimated model are plotted in Fig., along with the simulation error y o (k) ŷ(k). For the sake of visualization, only the samples from time 0 to 00 related to the first channel are shown in Fig.. The obtained BFR and MSE are reported in Table I. The estimated polyhedral partition of the regressor space is such that only out of 500 data samples are misclassified (i.e.,.4 % of the whole validation set). As the accuracy of the final model estimate and the total CPU time is influenced by the number M of runs of Algorithm, the performance of the proposed learning approach has been tested with respect to both M and the dimension N of the training set. The obtained BFR as a function of iterations of Algorithm is plotted in Fig. for different values of N. Clearly, there is no improvement in BFR on the two output channels after about 8 runs. The total CPU time for estimating the PWARX model is 0.76 s, of which 0.06 s are taken to solve the linear multi-category discrimination problem (3). For the sake of comparison, both the robust linear programming approach of [8] and the on-line method described in Section III-B. are also run to generate the partition. Results in Table II show that all of the three algorithms used to compute the polyhedral partition of the regressor space lead to an accurate estimate of the system, with BFRs larger then 95 % (except when N = 4000 and the on-line multi-category In executing the on-line approach in Section III-B., the weights π i and the initial guess of ϕ used by the stochastic gradient method are computed by executing the batch Algorithm in Section III-B. on the first 3000 training samples.

6 BFR.4. N=4000 N=0000 N= runs BFR.4. N=4000 N=0000 N= runs Fig.. PWARX identification: BFR on the first and on the second output channel vs number of runs of Algorithm discrimination method is used). Furthermore, the results show that the estimated models become more accurate as the number of training samples increases. The CPU times taken to compute the polyhedral partition are reported in Table III, which shows that, for a large training set (i.e., N = 00000), the off-line RPSN and the on-line ASGD method are about 300x and 600x faster, respectively, than the robust linear programming method of [8]. B. Identification of an LPV system Let the data be collected from the MIMO LPV system [ ] [ ] [ ] y (k) ā, (p(k)) ā y (k) =, (p(k)) y (k ) y (k ) ] [ ] + u(k ) u (k ) + e o (k), where ā, (p(k)) = ā, (p(k)) ā, (p(k)) [ b, (p(k)) b, (p(k)) b, (p(k)) b, (p(k)) 0.3 if 0.4 (p (k) + p (k)) 0.3, 0.3 if 0.4 (p (k) + p (k)) 0.3, 0.4 (p (k) + p (k)) otherwise, ā, (p(k)) = 0.5 ( p (k) + p (k) ), ā, (p(k)) = p (k) p (k), 0.5 if p (k) < 0, ā, (p(k)) = 0 if p (k) = 0, 0.5 if p (k) > 0, b, (p(k)) = 3p (k) + p (k), b, (p(k)) = { 0.5 if ( p (k) + p (k) ) 0.5, ( p (k) + p (k) ) otherwise, b, (p(k)) = sin {p (k) p (k)}, b, (p(k)) = 0. (8) Both the input u(k) and the scheduling vector p(k) are white noise sequences (independent of each other) of length N = 000 with uniform distribution in the boxes [ ] [ ] and [ ] [ ], respectively. The noise covariance matrix of e o (k) R is Λ e = [ ]. This corresponds to signal-to-noise ratios on the first and on the second output channel equal to SNR = 4 db and SNR = 7 db, respectively. Fig. 3. LPV identification: polyhedral partition of the scheduling vector space P The goal is to estimate, from the gathered data, a PWA approximation of the p-dependent nonlinear functions ā i,j and b i,j defining the behaviour of the LPV data-generating system (8). ) Choice of the number of modes: The number s of polyhedral regions defining the partition of the scheduling vector space P = [ ] [ ] is chosen through cross validation. Specifically, the 000-length training data set is split into two disjoint sets. The first 0000 samples are used to estimate a PWA approximation of ā i,j and b i,j, along with the polyhedral partition of the scheduling vector space P, for different values of s in the range For each value of s, the identification Algorithm is run 0 times. The second part of training data (i.e., the remaining 000 samples) is used to assess the quality of the identified LPV models. For each value of s, the BFR on the two output channels is computed. Among the identified LPV models, the one providing the largest aggregated BFR T = BFR + BFR is selected, which corresponds to s = 0 polyhedral regions. The computed polyhedral partition, obtained by solving problem (3) through the regularized piecewisesmooth Newton method explained in [6], is plotted in Fig. 3 (the Hybrid Toolbox for MATLAB [4] has been used to plot the polytopes in Fig. 3). ) Model quality assessment: The quality of the estimated LPV model is then assessed w.r.t. a validation dataset, consisting of a new sequence of N V = 000 noiseless samples used neither to estimate the LPV model nor to select the number of modes s. For the sake of comparison, the nonlinear coefficient functions ā i,j (p(k)) and b i,j (p(k)) are also estimated through the parametric LPV identification approach proposed in [3], by parameterizing the nonlinear functions ā i,j (p(k)) and b i,j (p(k)) as fourth-order polynomials in the two-dimensional scheduling vector p(k). The true outputs y o and the simulated output sequences ŷ of the estimated LPV models are plotted in Fig. 4, along with the simulation error y o (k) ŷ(k). For the sake of visualization, only the samples from time 0 to 00 related to the second channel are reported. The BFR and MSE on the

7 yo,ŷ time (samples) (a) First output channel (output signal): black = true, red = estimated Fig.. yo ŷ time (samples) (b) First output channel (simulation error) PWARX identification: output signal and simulation error on the first output channel TABLE IV LPV IDENTIFICATION: BFR AND MSE OF THE LPV MODELS ESTIMATED BY USING THE PROPOSED PWA REGRESSION APPROACH AND THE PARAMETRIC LPV IDENTIFICATION APPROACH OF [3] BFR BFR MSE MSE PWA regression 87 % 84 % parametric LPV [3] 80 % 70 % two output channels are reported in Table IV. The obtained results show that the proposed LPV identification approach based on the PWA approximation of the coefficient functions ā i,j (p(k)) and b i,j (p(k))) outperforms the parametric LPV identification approach in [3]. We also remark that the online computational time required to evaluate the output of the LPV model, given the current value of the scheduling vector p and the past input/output observations is about 0 µs, 40 µs of which are required to evaluate which region the current scheduling vector belongs to. Note that the latter step requires to compute the maximum of s = 0 affine functions {ϕ i ( p)} s i= defining the piecewise affine separator ϕ( p) in (0). This relatively low online computational time is mainly due to the PWA structure of the coefficient functions describing the LPV model, and it allows to use the estimated LPV model in applications requiring a fast online determination of the operating mode, such as in gain scheduling or in LPV model predictive control. 3) Performance of multi-category discrimination algorithms: The CPU time required to estimate the LPV model through the proposed PWA regression approach is 759 s. This includes the cross-validation phase to compute the number of modes s. For s = 0, the CPU time required to compute the LPV model is 4 s, 0.4 s of which are spent to compute the polyhedral partition via problem (3) (the robust linear programming multicategorical discrimination algorithm of [8] takes 4. seconds, i.e., almost 0x slower). For a more exhaustive comparison between the regularized piecewise-smooth Newton approach used to solve problem (3) and the robust linear programming algorithm of [8], the CPU time required by the two algorithms to partition the scheduling parameter space is plotted, as a function of s, in Fig. 5. Fig. 5 also shows the CPU time required by the averaged stochastic gradient descent algorithm in [6] to compute the solution of problem (4). The weights π i and the initial estimate used by the averaged stochastic gradient descent algorithm are computed by solving problem (3) on the first 000 training samples. The remaining 9000 training samples are processed recursively. The regularization parameter λ in problems (3) and (4) is set to 0 5. In order to test the performance of the three multicategory discrimination algorithms in terms of model accuracy, the aggregate best fit rate BFR T obtained by using the three algorithms is plotted, as a function of s, in Fig. 6. Results in Figs. 5 and 6 show that: the CPU time required by all of the three discrimination algorithms to partition the scheduling vector space increases with the number of modes s (Fig. 5). This is due to the fact that the number of parameters ξ defining the piecewise affine separator ϕ(x) in (0) increases linearly with s; the (offline) regularized piecewise-smooth Newton method and the (online) average stochastic gradient method used to solve problem (3) and (4), respectively, are faster (from 6x to 0x) than the robust linear programming based approach of [8]. in terms of model accuracy, the robust linear programming approach of [8] and the regularized piecewisesmooth Newton method achieve similar performance (Fig. 6), while, for s =, s = 4 and s = 0, the averaged stochastic gradient descent algorithm does not provide an accurate partition of the scheduling vector space, leading to LPV models with an aggregate best fit rates smaller than.. This means that, for s =, 4, 0 the solution of the averaged stochastic gradient descent algorithm fails to converge to the batch solution of problem (3) when only N = 0000 training samples are used. time [s] 0 5 RLP [8] RPSN ASGD s Fig. 5. LPV identification: CPU time required to partition the scheduling vector space vs number of modes (s).

8 yo,ŷ 0 yo ŷ time (samples) (a) Second output channel (output signal): black = true, red = PWA regression, green = polynomial parameterization time (samples) (b) Second output channel (simulation error): red = PWA regression, green = polynomial parameterization Fig. 4. LPV model estimate: output signal and simulation error on the second output channel. Estimate obtained through the PWA regression approach (red lines); estimate obtained through the parametric approach [3] (green lines) V. CONCLUSIONS In this paper we have reviewed the PWA regression algorithm introduced in [6], and discussed its application to the identification of PWARX and LPV dynamical systems. The approach consists of two stages: first, simultaneously cluster the observed regressors and estimate an affine submodel for each cluster; second, solve a convex unconstrained multicategory discrimination problem to determine a piecewise linear separator function and a polyhedral partition of the regressor space. The regularized piecewise-smooth Newton method used to compute the polyhedral partition of the regressor space makes the proposed regression algorithm computationally effective for offline learning, while the averaged stochastic gradient descent allows to process the data samples recursively (i.e., one data sample at the time), so that the overall algorithm can be implemented in real-time, such as for adaptive control purposes. Future research includes the extension of the PWA regression algorithm presented in the paper to the identification of hybrid and LPV systems under different noise conditions (e.g., Output-Error and Box-Jenkins model structures), and the generalization to piecewise-nonlinear models (such as piecewise polynomial) by feeding regression data manipulated through nonlinear basis functions to Algorithm and to the numerical algorithms used to solve multicategory discrimination problems. REFERENCES [] S.T. Alexander and A.L. Ghirnikar. A method for recursive least squares filtering based upon an inverse QR decomposition. IEEE Trans. Signal Processing, 4():0 30, 993. BFR T RLP [8] RPSN ASGD s Fig. 6. LPV identification: aggregate best fit rate BFR T vs number of modes (s). [] L. Bako, K. Boukharouba, E. Duviella, and S. Lecoeuche. A recursive identification algorithm for switched linear/affine models. Nonlinear Analysis: Hybrid Systems, 5():4 53, 0. [3] B. A. Bamieh and L. Giarré. Identification of linear parametervarying models. International Journal of Robust Nonlinear Control, (9):84 853, 00. [4] A. Bemporad. Hybrid Toolbox - User s Guide, 004. url: bemporad/hybrid/toolbox. [5] A. Bemporad, D. Bernardini, and P. Patrinos. A convex feasibility approach to anytime model predictive control. Technical report, IMT Institute for Advanced Studies, Lucca, February 05. http: //arxiv.org/abs/ [6] A. Bemporad, V. Breschi, and D. Piga. Piecewise affine regression via recursive multiple least squares and multicategory discrimination. Technical report, IMT Institute for Advanced Studies, Lucca, October 05. Submitted to Automatica. Available at: dariopiga.com/tr/tr_pwareg_bbp_05.pdf. [7] A. Bemporad, A. Garulli, S. Paoletti, and A. Vicino. A bounded-error approach to piecewise affine system identification. IEEE Transactions on Automatic Control, 50(0): , 005. [8] K.P. Bennett and O.L. Mangasarian. Multicategory discrimination via linear programming. Optimization Methods and Software, 3:7 39, 994. [9] G. Ferrari-Trecate, M. Muselli, D. Liberati, and M. Morari. A clustering technique for the identification of piecewise affine systems. Automatica, 39():05 7, 003. [0] A. L. Juloski, S. Weiland, and W. P. M. H. Heemels. A bayesian approach to identification of hybrid systems. IEEE Transactions on Automatic Control, 50(0):50 533, 005. [] A. Likas, N. Vlassis, and J. Verbeek. The global k-means clustering algorithm. Pattern recognition, 36():45 46, 003. [] H. Nakada, K. Takaba, and T. Katayama. Identification of piecewise affine systems based on statistical clustering technique. Automatica, 4(5):905 93, 005. [3] H. Ohlsson and L. Ljung. Identification of switched linear regression models using sum-of-norms regularization. Automatica, 49(4): , 03. [4] N. Ozay, C. Lagoa, and M. Sznaier. Set membership identification of switched linear systems with known number of subsystems. Automatica, 5:80 9, 05. [5] S. Paoletti, A. L. Juloski, G. Ferrari-Trecate, and R. Vidal. Identification of hybrid systems a tutorial. European journal of control, 3():4 60, 007. [6] D. Piga, P. Cox, R. Tóth, and V. Laurain. LPV system identification under noise corrupted scheduling and output signal observations. Automatica, 53:39 338, 05. [7] J. Roll, A. Bemporad, and L. Ljung. Identification of piecewise affine systems via mixed-integer programming. Automatica, 40():37 50, 004. [8] R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 67 88, 996. [9] V. Vapnik. Statistical learning theory. Wiley New York, 998.

A Greedy Approach to Identification of Piecewise Affine Models

A Greedy Approach to Identification of Piecewise Affine Models A Greedy Approach to Identification of Piecewise Affine Models Alberto Bemporad, Andrea Garulli, Simone Paoletti, and Antonio Vicino Università di Siena, Dipartimento di Ingegneria dell Informazione, Via

More information

A new hybrid system identification algorithm with automatic tuning

A new hybrid system identification algorithm with automatic tuning Proceedings of the 7th World Congress The International Federation of Automatic Control A new hybrid system identification algorithm with automatic tuning Fabien Lauer and Gérard Bloch Centre de Recherche

More information

A Learning Algorithm for Piecewise Linear Regression

A Learning Algorithm for Piecewise Linear Regression A Learning Algorithm for Piecewise Linear Regression Giancarlo Ferrari-Trecate 1, arco uselli 2, Diego Liberati 3, anfred orari 1 1 nstitute für Automatik, ETHZ - ETL CH 8092 Zürich, Switzerland 2 stituto

More information

634 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 57, NO. 3, MARCH 2012

634 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 57, NO. 3, MARCH 2012 634 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 57, NO. 3, MARCH 2012 A Sparsification Approach to Set Membership Identification of Switched Affine Systems Necmiye Ozay, Member, IEEE, Mario Sznaier, Member,

More information

I How does the formulation (5) serve the purpose of the composite parameterization

I How does the formulation (5) serve the purpose of the composite parameterization Supplemental Material to Identifying Alzheimer s Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis I How does the formulation (5)

More information

A Clustering Technique for the Identification of Piecewise Affine Systems

A Clustering Technique for the Identification of Piecewise Affine Systems A Clustering Technique for the Identification of Piecewise Affine Systems Giancarlo Ferrari-Trecate 1, Marco Muselli 2, Diego Liberati 2, and Manfred Morari 1 1 Institut für Automatik, ETH - Swiss Federal

More information

Multiresponse Sparse Regression with Application to Multidimensional Scaling

Multiresponse Sparse Regression with Application to Multidimensional Scaling Multiresponse Sparse Regression with Application to Multidimensional Scaling Timo Similä and Jarkko Tikka Helsinki University of Technology, Laboratory of Computer and Information Science P.O. Box 54,

More information

Supervised vs. Unsupervised Learning

Supervised vs. Unsupervised Learning Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now

More information

Linear Methods for Regression and Shrinkage Methods

Linear Methods for Regression and Shrinkage Methods Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors

More information

A continuous optimization framework for hybrid system identification

A continuous optimization framework for hybrid system identification A continuous optimization framework for hybrid system identification Fabien Lauer, Gérard Bloch, René Vidal To cite this version: Fabien Lauer, Gérard Bloch, René Vidal. A continuous optimization framework

More information

Combine the PA Algorithm with a Proximal Classifier

Combine the PA Algorithm with a Proximal Classifier Combine the Passive and Aggressive Algorithm with a Proximal Classifier Yuh-Jye Lee Joint work with Y.-C. Tseng Dept. of Computer Science & Information Engineering TaiwanTech. Dept. of Statistics@NCKU

More information

RELATIVELY OPTIMAL CONTROL: THE STATIC SOLUTION

RELATIVELY OPTIMAL CONTROL: THE STATIC SOLUTION RELATIVELY OPTIMAL CONTROL: THE STATIC SOLUTION Franco Blanchini,1 Felice Andrea Pellegrino Dipartimento di Matematica e Informatica Università di Udine via delle Scienze, 208 33100, Udine, Italy blanchini@uniud.it,

More information

Neural Networks. Theory And Practice. Marco Del Vecchio 19/07/2017. Warwick Manufacturing Group University of Warwick

Neural Networks. Theory And Practice. Marco Del Vecchio 19/07/2017. Warwick Manufacturing Group University of Warwick Neural Networks Theory And Practice Marco Del Vecchio marco@delvecchiomarco.com Warwick Manufacturing Group University of Warwick 19/07/2017 Outline I 1 Introduction 2 Linear Regression Models 3 Linear

More information

Mixture Models and the EM Algorithm

Mixture Models and the EM Algorithm Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

Mixture Models and EM

Mixture Models and EM Mixture Models and EM Goal: Introduction to probabilistic mixture models and the expectationmaximization (EM) algorithm. Motivation: simultaneous fitting of multiple model instances unsupervised clustering

More information

Linear Machine: A Novel Approach to Point Location Problem

Linear Machine: A Novel Approach to Point Location Problem Preprints of the 10th IFAC International Symposium on Dynamics and Control of Process Systems The International Federation of Automatic Control Linear Machine: A Novel Approach to Point Location Problem

More information

Locally Weighted Least Squares Regression for Image Denoising, Reconstruction and Up-sampling

Locally Weighted Least Squares Regression for Image Denoising, Reconstruction and Up-sampling Locally Weighted Least Squares Regression for Image Denoising, Reconstruction and Up-sampling Moritz Baecher May 15, 29 1 Introduction Edge-preserving smoothing and super-resolution are classic and important

More information

1. Introduction. performance of numerical methods. complexity bounds. structural convex optimization. course goals and topics

1. Introduction. performance of numerical methods. complexity bounds. structural convex optimization. course goals and topics 1. Introduction EE 546, Univ of Washington, Spring 2016 performance of numerical methods complexity bounds structural convex optimization course goals and topics 1 1 Some course info Welcome to EE 546!

More information

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Performance Estimation and Regularization Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Bias- Variance Tradeoff Fundamental to machine learning approaches Bias- Variance Tradeoff Error due to Bias:

More information

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation Optimization Methods: Introduction and Basic concepts 1 Module 1 Lecture Notes 2 Optimization Problem and Model Formulation Introduction In the previous lecture we studied the evolution of optimization

More information

Recent Developments in Model-based Derivative-free Optimization

Recent Developments in Model-based Derivative-free Optimization Recent Developments in Model-based Derivative-free Optimization Seppo Pulkkinen April 23, 2010 Introduction Problem definition The problem we are considering is a nonlinear optimization problem with constraints:

More information

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University September 20 2018 Review Solution for multiple linear regression can be computed in closed form

More information

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K. GAMs semi-parametric GLMs Simon Wood Mathematical Sciences, University of Bath, U.K. Generalized linear models, GLM 1. A GLM models a univariate response, y i as g{e(y i )} = X i β where y i Exponential

More information

Machine Learning: Think Big and Parallel

Machine Learning: Think Big and Parallel Day 1 Inderjit S. Dhillon Dept of Computer Science UT Austin CS395T: Topics in Multicore Programming Oct 1, 2013 Outline Scikit-learn: Machine Learning in Python Supervised Learning day1 Regression: Least

More information

Instance-based Learning

Instance-based Learning Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 19 th, 2007 2005-2007 Carlos Guestrin 1 Why not just use Linear Regression? 2005-2007 Carlos Guestrin

More information

CS 229 Midterm Review

CS 229 Midterm Review CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask

More information

CS 559: Machine Learning Fundamentals and Applications 9 th Set of Notes

CS 559: Machine Learning Fundamentals and Applications 9 th Set of Notes 1 CS 559: Machine Learning Fundamentals and Applications 9 th Set of Notes Instructor: Philippos Mordohai Webpage: www.cs.stevens.edu/~mordohai E-mail: Philippos.Mordohai@stevens.edu Office: Lieb 215 Overview

More information

Perceptron as a graph

Perceptron as a graph Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 10 th, 2007 2005-2007 Carlos Guestrin 1 Perceptron as a graph 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-6 -4-2

More information

Contents. I Basics 1. Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

Contents. I Basics 1. Copyright by SIAM. Unauthorized reproduction of this article is prohibited. page v Preface xiii I Basics 1 1 Optimization Models 3 1.1 Introduction... 3 1.2 Optimization: An Informal Introduction... 4 1.3 Linear Equations... 7 1.4 Linear Optimization... 10 Exercises... 12 1.5

More information

Generating the Reduced Set by Systematic Sampling

Generating the Reduced Set by Systematic Sampling Generating the Reduced Set by Systematic Sampling Chien-Chung Chang and Yuh-Jye Lee Email: {D9115009, yuh-jye}@mail.ntust.edu.tw Department of Computer Science and Information Engineering National Taiwan

More information

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used. 1 4.12 Generalization In back-propagation learning, as many training examples as possible are typically used. It is hoped that the network so designed generalizes well. A network generalizes well when

More information

ELEG Compressive Sensing and Sparse Signal Representations

ELEG Compressive Sensing and Sparse Signal Representations ELEG 867 - Compressive Sensing and Sparse Signal Representations Gonzalo R. Arce Depart. of Electrical and Computer Engineering University of Delaware Fall 211 Compressive Sensing G. Arce Fall, 211 1 /

More information

Bilevel Sparse Coding

Bilevel Sparse Coding Adobe Research 345 Park Ave, San Jose, CA Mar 15, 2013 Outline 1 2 The learning model The learning algorithm 3 4 Sparse Modeling Many types of sensory data, e.g., images and audio, are in high-dimensional

More information

Convexization in Markov Chain Monte Carlo

Convexization in Markov Chain Monte Carlo in Markov Chain Monte Carlo 1 IBM T. J. Watson Yorktown Heights, NY 2 Department of Aerospace Engineering Technion, Israel August 23, 2011 Problem Statement MCMC processes in general are governed by non

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

Clustering. Supervised vs. Unsupervised Learning

Clustering. Supervised vs. Unsupervised Learning Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now

More information

Software Documentation of the Potential Support Vector Machine

Software Documentation of the Potential Support Vector Machine Software Documentation of the Potential Support Vector Machine Tilman Knebel and Sepp Hochreiter Department of Electrical Engineering and Computer Science Technische Universität Berlin 10587 Berlin, Germany

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

Clustering and Visualisation of Data

Clustering and Visualisation of Data Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some

More information

Controlling Hybrid Systems

Controlling Hybrid Systems Controlling Hybrid Systems From Theory to Application Manfred Morari M. Baotic, F. Christophersen, T. Geyer, P. Grieder, M. Kvasnica, G. Papafotiou Lessons learned from a decade of Hybrid System Research

More information

Notes on Robust Estimation David J. Fleet Allan Jepson March 30, 005 Robust Estimataion. The field of robust statistics [3, 4] is concerned with estimation problems in which the data contains gross errors,

More information

Understanding Clustering Supervising the unsupervised

Understanding Clustering Supervising the unsupervised Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data

More information

An Optimal Regression Algorithm for Piecewise Functions Expressed as Object-Oriented Programs

An Optimal Regression Algorithm for Piecewise Functions Expressed as Object-Oriented Programs 2010 Ninth International Conference on Machine Learning and Applications An Optimal Regression Algorithm for Piecewise Functions Expressed as Object-Oriented Programs Juan Luo Department of Computer Science

More information

CSE 547: Machine Learning for Big Data Spring Problem Set 2. Please read the homework submission policies.

CSE 547: Machine Learning for Big Data Spring Problem Set 2. Please read the homework submission policies. CSE 547: Machine Learning for Big Data Spring 2019 Problem Set 2 Please read the homework submission policies. 1 Principal Component Analysis and Reconstruction (25 points) Let s do PCA and reconstruct

More information

6. Linear Discriminant Functions

6. Linear Discriminant Functions 6. Linear Discriminant Functions Linear Discriminant Functions Assumption: we know the proper forms for the discriminant functions, and use the samples to estimate the values of parameters of the classifier

More information

Parameter Estimation and Model Order Identification of LTI Systems

Parameter Estimation and Model Order Identification of LTI Systems Preprint, 11th IFAC Symposium on Dynamics and Control of Process Systems, including Biosystems Parameter Estimation and Model Order Identification of LTI Systems Santhosh Kumar Varanasi, Phanindra Jampana

More information

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford Department of Engineering Science University of Oxford January 27, 2017 Many datasets consist of multiple heterogeneous subsets. Cluster analysis: Given an unlabelled data, want algorithms that automatically

More information

Lecture 13: Model selection and regularization

Lecture 13: Model selection and regularization Lecture 13: Model selection and regularization Reading: Sections 6.1-6.2.1 STATS 202: Data mining and analysis October 23, 2017 1 / 17 What do we know so far In linear regression, adding predictors always

More information

Intelligent Methods in Modelling and Simulation of Complex Systems

Intelligent Methods in Modelling and Simulation of Complex Systems SNE O V E R V I E W N OTE Intelligent Methods in Modelling and Simulation of Complex Systems Esko K. Juuso * Control Engineering Laboratory Department of Process and Environmental Engineering, P.O.Box

More information

6 Model selection and kernels

6 Model selection and kernels 6. Bias-Variance Dilemma Esercizio 6. While you fit a Linear Model to your data set. You are thinking about changing the Linear Model to a Quadratic one (i.e., a Linear Model with quadratic features φ(x)

More information

Precise Piecewise Affine Models from Input-Output Data

Precise Piecewise Affine Models from Input-Output Data Precise Piecewise Affine Models from Input-Output Data Rajeev Alur, Nimit Singhania University of Pennsylvania ABSTRACT Formal design and analysis of embedded control software relies on mathematical models

More information

Programming, numerics and optimization

Programming, numerics and optimization Programming, numerics and optimization Lecture C-4: Constrained optimization Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428 June

More information

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014

More information

Generative and discriminative classification techniques

Generative and discriminative classification techniques Generative and discriminative classification techniques Machine Learning and Category Representation 013-014 Jakob Verbeek, December 13+0, 013 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.13.14

More information

Part 4. Decomposition Algorithms Dantzig-Wolf Decomposition Algorithm

Part 4. Decomposition Algorithms Dantzig-Wolf Decomposition Algorithm In the name of God Part 4. 4.1. Dantzig-Wolf Decomposition Algorithm Spring 2010 Instructor: Dr. Masoud Yaghini Introduction Introduction Real world linear programs having thousands of rows and columns.

More information

1. Lecture notes on bipartite matching

1. Lecture notes on bipartite matching Massachusetts Institute of Technology 18.453: Combinatorial Optimization Michel X. Goemans February 5, 2017 1. Lecture notes on bipartite matching Matching problems are among the fundamental problems in

More information

Lecture 19: November 5

Lecture 19: November 5 0-725/36-725: Convex Optimization Fall 205 Lecturer: Ryan Tibshirani Lecture 9: November 5 Scribes: Hyun Ah Song Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not

More information

Chapter 7: Numerical Prediction

Chapter 7: Numerical Prediction Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Chapter 7: Numerical Prediction Lecture: Prof. Dr.

More information

Alternating Direction Method of Multipliers

Alternating Direction Method of Multipliers Alternating Direction Method of Multipliers CS 584: Big Data Analytics Material adapted from Stephen Boyd (https://web.stanford.edu/~boyd/papers/pdf/admm_slides.pdf) & Ryan Tibshirani (http://stat.cmu.edu/~ryantibs/convexopt/lectures/21-dual-meth.pdf)

More information

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs) Data Mining: Concepts and Techniques Chapter 9 Classification: Support Vector Machines 1 Support Vector Machines (SVMs) SVMs are a set of related supervised learning methods used for classification Based

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06 Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,

More information

PATTERN CLASSIFICATION AND SCENE ANALYSIS

PATTERN CLASSIFICATION AND SCENE ANALYSIS PATTERN CLASSIFICATION AND SCENE ANALYSIS RICHARD O. DUDA PETER E. HART Stanford Research Institute, Menlo Park, California A WILEY-INTERSCIENCE PUBLICATION JOHN WILEY & SONS New York Chichester Brisbane

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

Cluster Analysis. Ying Shen, SSE, Tongji University

Cluster Analysis. Ying Shen, SSE, Tongji University Cluster Analysis Ying Shen, SSE, Tongji University Cluster analysis Cluster analysis groups data objects based only on the attributes in the data. The main objective is that The objects within a group

More information

CS294-1 Assignment 2 Report

CS294-1 Assignment 2 Report CS294-1 Assignment 2 Report Keling Chen and Huasha Zhao February 24, 2012 1 Introduction The goal of this homework is to predict a users numeric rating for a book from the text of the user s review. The

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

Classification: Linear Discriminant Functions

Classification: Linear Discriminant Functions Classification: Linear Discriminant Functions CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Discriminant functions Linear Discriminant functions

More information

The fastclime Package for Linear Programming and Large-Scale Precision Matrix Estimation in R

The fastclime Package for Linear Programming and Large-Scale Precision Matrix Estimation in R Journal of Machine Learning Research (2013) Submitted ; Published The fastclime Package for Linear Programming and Large-Scale Precision Matrix Estimation in R Haotian Pang Han Liu Robert Vanderbei Princeton

More information

3 Nonlinear Regression

3 Nonlinear Regression CSC 4 / CSC D / CSC C 3 Sometimes linear models are not sufficient to capture the real-world phenomena, and thus nonlinear models are necessary. In regression, all such models will have the same basic

More information

5 Machine Learning Abstractions and Numerical Optimization

5 Machine Learning Abstractions and Numerical Optimization Machine Learning Abstractions and Numerical Optimization 25 5 Machine Learning Abstractions and Numerical Optimization ML ABSTRACTIONS [some meta comments on machine learning] [When you write a large computer

More information

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1 Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that

More information

Robust Kernel Methods in Clustering and Dimensionality Reduction Problems

Robust Kernel Methods in Clustering and Dimensionality Reduction Problems Robust Kernel Methods in Clustering and Dimensionality Reduction Problems Jian Guo, Debadyuti Roy, Jing Wang University of Michigan, Department of Statistics Introduction In this report we propose robust

More information

An algorithm for censored quantile regressions. Abstract

An algorithm for censored quantile regressions. Abstract An algorithm for censored quantile regressions Thanasis Stengos University of Guelph Dianqin Wang University of Guelph Abstract In this paper, we present an algorithm for Censored Quantile Regression (CQR)

More information

3D Surface Recovery via Deterministic Annealing based Piecewise Linear Surface Fitting Algorithm

3D Surface Recovery via Deterministic Annealing based Piecewise Linear Surface Fitting Algorithm 3D Surface Recovery via Deterministic Annealing based Piecewise Linear Surface Fitting Algorithm Bing Han, Chris Paulson, and Dapeng Wu Department of Electrical and Computer Engineering University of Florida

More information

Cluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008

Cluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008 Cluster Analysis Jia Li Department of Statistics Penn State University Summer School in Statistics for Astronomers IV June 9-1, 8 1 Clustering A basic tool in data mining/pattern recognition: Divide a

More information

All lecture slides will be available at CSC2515_Winter15.html

All lecture slides will be available at  CSC2515_Winter15.html CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 9: Support Vector Machines All lecture slides will be available at http://www.cs.toronto.edu/~urtasun/courses/csc2515/ CSC2515_Winter15.html Many

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

Computation of the Constrained Infinite Time Linear Quadratic Optimal Control Problem

Computation of the Constrained Infinite Time Linear Quadratic Optimal Control Problem Computation of the Constrained Infinite Time Linear Quadratic Optimal Control Problem July 5, Introduction Abstract Problem Statement and Properties In this paper we will consider discrete-time linear

More information

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer

More information

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016 CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:

More information

Optimization. Industrial AI Lab.

Optimization. Industrial AI Lab. Optimization Industrial AI Lab. Optimization An important tool in 1) Engineering problem solving and 2) Decision science People optimize Nature optimizes 2 Optimization People optimize (source: http://nautil.us/blog/to-save-drowning-people-ask-yourself-what-would-light-do)

More information

Machine Learning / Jan 27, 2010

Machine Learning / Jan 27, 2010 Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,

More information

Unlabeled Data Classification by Support Vector Machines

Unlabeled Data Classification by Support Vector Machines Unlabeled Data Classification by Support Vector Machines Glenn Fung & Olvi L. Mangasarian University of Wisconsin Madison www.cs.wisc.edu/ olvi www.cs.wisc.edu/ gfung The General Problem Given: Points

More information

COMS 4771 Support Vector Machines. Nakul Verma

COMS 4771 Support Vector Machines. Nakul Verma COMS 4771 Support Vector Machines Nakul Verma Last time Decision boundaries for classification Linear decision boundary (linear classification) The Perceptron algorithm Mistake bound for the perceptron

More information

732A54/TDDE31 Big Data Analytics

732A54/TDDE31 Big Data Analytics 732A54/TDDE31 Big Data Analytics Lecture 10: Machine Learning with MapReduce Jose M. Peña IDA, Linköping University, Sweden 1/27 Contents MapReduce Framework Machine Learning with MapReduce Neural Networks

More information

Integer Programming Theory

Integer Programming Theory Integer Programming Theory Laura Galli October 24, 2016 In the following we assume all functions are linear, hence we often drop the term linear. In discrete optimization, we seek to find a solution x

More information

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 24 2019 Logistics HW 1 is due on Friday 01/25 Project proposal: due Feb 21 1 page description

More information

THE preceding chapters were all devoted to the analysis of images and signals which

THE preceding chapters were all devoted to the analysis of images and signals which Chapter 5 Segmentation of Color, Texture, and Orientation Images THE preceding chapters were all devoted to the analysis of images and signals which take values in IR. It is often necessary, however, to

More information

3 Feature Selection & Feature Extraction

3 Feature Selection & Feature Extraction 3 Feature Selection & Feature Extraction Overview: 3.1 Introduction 3.2 Feature Extraction 3.3 Feature Selection 3.3.1 Max-Dependency, Max-Relevance, Min-Redundancy 3.3.2 Relevance Filter 3.3.3 Redundancy

More information

Advanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs

Advanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs Advanced Operations Research Techniques IE316 Quiz 1 Review Dr. Ted Ralphs IE316 Quiz 1 Review 1 Reading for The Quiz Material covered in detail in lecture. 1.1, 1.4, 2.1-2.6, 3.1-3.3, 3.5 Background material

More information

Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions

Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions Thomas Giraud Simon Chabot October 12, 2013 Contents 1 Discriminant analysis 3 1.1 Main idea................................

More information

Algebraic Iterative Methods for Computed Tomography

Algebraic Iterative Methods for Computed Tomography Algebraic Iterative Methods for Computed Tomography Per Christian Hansen DTU Compute Department of Applied Mathematics and Computer Science Technical University of Denmark Per Christian Hansen Algebraic

More information

Perceptron: This is convolution!

Perceptron: This is convolution! Perceptron: This is convolution! v v v Shared weights v Filter = local perceptron. Also called kernel. By pooling responses at different locations, we gain robustness to the exact spatial location of image

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

Support Vector Machines.

Support Vector Machines. Support Vector Machines srihari@buffalo.edu SVM Discussion Overview 1. Overview of SVMs 2. Margin Geometry 3. SVM Optimization 4. Overlapping Distributions 5. Relationship to Logistic Regression 6. Dealing

More information

3 Nonlinear Regression

3 Nonlinear Regression 3 Linear models are often insufficient to capture the real-world phenomena. That is, the relation between the inputs and the outputs we want to be able to predict are not linear. As a consequence, nonlinear

More information

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany

More information

Louis Fourrier Fabien Gaie Thomas Rolf

Louis Fourrier Fabien Gaie Thomas Rolf CS 229 Stay Alert! The Ford Challenge Louis Fourrier Fabien Gaie Thomas Rolf Louis Fourrier Fabien Gaie Thomas Rolf 1. Problem description a. Goal Our final project is a recent Kaggle competition submitted

More information