Identification of Hybrid and Linear Parameter Varying Models via Recursive Piecewise Affine Regression and Discrimination
|
|
- Bethanie Lindsey
- 5 years ago
- Views:
Transcription
1 Identification of Hybrid and Linear Parameter Varying Models via Recursive Piecewise Affine Regression and Discrimination Valentina Breschi, Alberto Bemporad and Dario Piga Abstract Piecewise affine (PWA) regression is a supervised learning method which aims at estimating, from a set of training data, a PWA map approximating the relationship between a set of explanatory variables (commonly called regressors) and continuous-valued outputs. PWA maps are described by a finite collection of affine functions, defined over a polyhedral partition of the regressor space. The PWA regression problem amounts to estimating both the affine functions and the polyhedral partition from data. In this paper, we describe a recursive and numerically efficient PWA regression algorithm based on a combined use of (i) clustering and multi-model recursive least-squares techniques and (ii) linear multi-category discrimination. Then, we discuss its application to the identification of multi-input multi-output PWA dynamical models in autoregressive form and to the identification of linear parameter-varying (LPV) models. In the former identification problem, the regressor is defined as a collection of past input and output observations, and local linear time-invariant models are estimated together with the partition of the regressor space. In the latter problem, PWA approximations of the scheduling variable dependent coefficients of an LPV model are estimated from the data, along with a partition of the scheduling variable domain. I. INTRODUCTION One of the most challenging problems in regression learning is choosing a proper structure to describe the underlying relation between input (or regressor) vectors x R nx and continuous-valued target outputs y R n y. On the one hand, too simple models (e.g., linear functions of the independent variable x) might not be flexible enough to describe the input/output relationship, leading to a systematic error when predicting the output for a given input. On the other hand, complex model structures tend to be computationally difficult and to overfit the training data, leading to poor generalization to unseen data. This problem is commonly referred in machine learning and system identification as bias-variance tradeoff [9]. Several methods exist in the literature to deal with the bias-variance tradeoff. A first possibility is to use overparameterized models, described by a large set of a priori fixed nonlinear basis functions, from which only a few might actually be needed for an accurate approximation of the underlying relationship among the variables. Sparse estimation methods like the LASSO [8] are then used to *This work was partially supported by the European Commission under project H00-SPIRE DISIRE - Distributed In-Situ Sensors Integrated into Raw Material and Energy Feedstock ( eu/disire/). The authors are with the IMT Institute of Advanced Studies Lucca, 5500 Lucca, Italy. s: {valentina.breschi, alberto.bemporad, identify a low-complex model, by minimizing the fitting error along with a penalty term aiming at minimizing the complexity of the model. However, the final estimate strongly depends on the chosen basis functions. Motivated by the need of simple and flexible model structures, algorithms for computing PieceWise Affine (PWA) approximations of nonlinear regression functions have been developed. Piecewise affine functions are given by partitioning the regressor space into a finite number of polyhedral regions, and by considering an affine submodel on each polyhedron. Both the partition of the regressor space and the parameters defining each affine submodel are inferred from data. PWA regression algorithms can be also extended to the identification of hybrid and PieceWise Affine autoregressive with exogenous inputs (PWARX) systems by defining the regressor as the collection of past input and output observations (see [5] for a literature overview). Among the algorithms developed for the identification of hybrid and PWARX systems, we mention the bounded-error methods [7], [4], the clustering-based procedures [], [9], [], the mixed integer quadratic programming approach [7], the Bayesian method [0], and the sparse optimization method [3]. All these algorithms are executed in a batch fashion, and thus they are not suitable for an on-line implementation, apart from the method in []. This paper proposes an alternative, numerically efficient algorithm, for the identification of discrete-time multi-input multi-output PWARX models. The same algorithm can be properly extended to the identification of Linear Parameter- Varying (LPV) models. In this case, a PWA approximation of the scheduling variable dependent LPV model coefficients is estimated from the training data, along with a polyhedral partition of the scheduling variable domain. The proposed identification approach relies on the two-stage PWA regression algorithm recently developed by the authors and detailed in [6]. The main idea of the algorithm is to sequentially process the observed regressor/output pairs and assign each pair to the affine submodel that is most compatible to it. At the same time, while doing the assignment of a pair, the parameters of the corresponding submodel are updated via a recursive least-squares technique based on inverse QR decomposition. The second stage starts once all the observations have been classified. At this stage, the regressor domain is partitioned into polyhedral regions through a piecewise linear separator, which is computed by solving a convex unconstrained optimization problem, whose solution can be efficiently computed either off-line through a
2 regularized piecewise-smooth Newton method or recursively through an averaged stochastic gradient descent algorithm. Overall, the proposed identification approach turns out to be computationally very effective for offline learning, and suitable for online learning. The paper is organized as follows. The general PWA regression problem is formulated in Section II. The specific problems of data-driven modeling of discrete-time PWARX models and LPV models are formulated in Section II-B and Section II-C, respectively, extending the contribution in [6] by showing its applicability to system identification problems. Section III summarizes the general PWA regression method, which consists of an algorithm to simultaneously cluster the observed regressors and update the model parameters, and two multi-category discrimination algorithms to compute the polyhedral partition of the regressor domain (more details on these algorithms can be found in [6]). The application of the PWA regression algorithm described in Section III to the identification of PWARX and LPV models is discussed in Section IV through two numerical examples. Concluding remarks and some directions for future research are drawn in Section V. A. Notation Let R n be the set of real vectors of dimension n. Let I {,,..., } be a finite set of integers and denote by I the cardinality of I. Given a vector a R n, let a i denote the i-th entry of a, a I the subvector obtained by collecting the entries a i for all i I, a the Euclidean norm of a, a + a vector whose i-th element is max{a i, 0}. Given two vectors a, b R n, max(a, b) is the vector whose i-th component is max{a i, b i }. Given a matrix A R n m, A denotes the transpose of A, A i the i-th row of A. Given two matrices A and B, A B denotes the Kronecker product between A and B. Let I n be the identity matrix of size n and n be the n-dimensional column vector of ones. II. PROBLEM STATEMENT A. Piecewise affine regression Consider the vector-valued PWA function f : X R ny A [ x ] if x X, f(x) =. () A s [ x ] if x X s, where x R n x, X R n x, s N denotes the number of modes (i.e., the number of affine functions defining f), A i R n y (n x +) are parameter matrices, and the sets X i, i =,..., s are polyhedra defined by the linear inequalities X i. = {x R n x : H i x D i }, () with H i and D i being real-valued matrices, that form a complete polyhedral partition of the space X. The following A collection {X i } s i= is a complete partition of the regressor domain X if s i= X i = X and X i X j =, i j, with X i denoting the interior of X i. regression problem is addressed: Problem : PWA regression Consider the data-generating system: y(k) = f o (x(k)) + e o (k), (3) where e o (k) R n y is a zero-mean additive random noise independent of x(k) and f o : X R ny is an unknown function, possibly discontinuous and not necessarily PWA. The PWA regression problem aims at finding a PWA approximation f of the unknown regression function f o based on a set of N observations {x(k), y(k)} N k=. Estimating a PWA approximation f of the true function f o requires (i) choosing the number of modes s, (ii) computing the parameter matrices {A i } s i= that characterize the PWA submodels defining f, and (iii) finding the polyhedral partition {X i } s i= of the regressor space X where those PWA submodels are defined. In this work, we assume that the number of modes s is fixed by the user. When choosing the value of s, one must take into account the tradeoff between data fitting and model complexity. For small values of s, the PWA map f cannot accurately capture the shape of the nonlinear function f o. On the other hand, increasing the number of modes also increases the degrees of freedom in the description of the PWA map f, which may cause overfitting and poor generalization to unseen data (i.e., the final estimate is sensitive to the noise corrupting the observations), besides increasing the complexity of the estimation procedure and of the resulting PWA model. The value of s can be chosen through cross-calibration based procedures, by testing, for different values of s, the performance of the estimated model on a fresh set of data (i.e., calibration data set) which is not used in the estimation of the parameter matrices {A i } s i= and of polyhedral partition {X i } s i=. The following subsections show how the problems of identifying hybrid dynamical models in PWARX form and Linear Parameter-Varying models can be cast as the PWA regression problem. B. Identification of PWARX models Model () represents a Multi-Input Multi-Output (MIMO) PWARX discrete-time dynamical system if the regressor vector x(k) at the sampling step k is a collection of past input and output observations, i.e., x(k) = [y (k ) y (k ) y (k n a ) u (k ) u (k ) u (k n b )], (4) where u(k) R n u and y(k) R n y denote the observed input and output signals at time k, respectively. C. Identification of LPV models Linear parameter-varying systems can be seen as an extension of Linear Time-Invariant (LTI) systems, as the dynamic relation between input and output signals is linear but it can
3 change over time according to a measurable time-varying signal p(t) R np, the so-called scheduling vector. The identification problem of MIMO LPV systems can be formulated as the PWA regression Problem by adopting the following MIMO LPV-ARX form n a n b y(k)=a 0 (p(k))+ a j (p(k))y(k j)+ a j+na (p(k))u(k j) j= j= where a j (p(k)), j = 0,..., n a + n b, are PWA functions of p(k) defined as: A j (p(k)) if p(k) P, a j (p(k)) =. (5) A j s(p(k)) if p(k) P s, and p(k) P R np is the value of the scheduling vector at time k. By imposing that each entry of matrix A j i (p(k)) depends affinely on the scheduling vector p(k), the LPV identification problem reduces to reconstructing the PWA mapping of the p-dependent coefficient functions {a j (p(k))} n a+n b j=0 over the polyhedral partition {P i } s i= of the scheduling vector set P. In this case, the N-length sequence of observations {x(k), y(k)} N k= requires x(k) = [y (k )... y (k n a ) u (k )... u (k n b )] [ p (k)]. Note that, since the polyhedral partition {P i } s i= is not fixed a priori, the underlying dependencies of the functions a j (p(k)) on the scheduling vector p(k) are directly reconstructed from data. This represents one of the main advantages w.r.t. to widely used parametric LPV identification approaches (see, e.g., [3], [6]), which in turn require to parameterize a j (p(k)) as a linear combination of some basis functions specified a priori (e.g., polynomial functions). III. PWA REGRESSION ALGORITHM In this section, we summarize the two-stage algorithm to solve the general PWA regression problem (namely, Problem ) proposed in [6]. Its application to the identification of PWARX and LPV models is discussed in Section IV through two simulation examples. The proposed PWA regression algorithm consists of two stages: S. Simultaneous clustering of the regressor vectors {x(k)} N k= and estimation of the parameter matrices {A i } s i= describing the PWA map in (). This is performed recursively, by processing the training pairs {x(k), y(k)} sequentially. S. Computation of a polyhedral partition of the regressor space X through a multi-category linear separation method. This stage is performed after stage S, and it can be executed either offline or online (recursively). A. Iterative clustering and parameter estimation Stage S is carried out as described in Algorithm. The algorithm is an extension to the case of multiple linear regressions and clustering of the approach proposed in [] for solving recursive least squares problems using inverse QR decomposition. Algorithm Recursive clustering and parameter estimation algorithm Input: Sequence of observations {x(k), y(k)} N k=, desired number s of modes, noise covariance matrix Λ e ; initial condition for matrices A i, cluster centroids c i, and centroid covariance matrices R i, i =,..., s.. let C i, i =,..., s;. for k =,..., N do [.. let e i (k) y(k) A ] i x(k), i =,..., s;.. let i(k) arg min (x(k) c i) R i (x(k) c i )+ i=,...,s e i (k) Λ e e i (k);.3. let C i(k) C i(k) {x(k)};.4. update the parameter matrices A i(k) using the inverse QR factorization approach of [];.5. let δc i(k) C i(k) (x(k) c i(k) );.6. update the centroid c i(k) of cluster C i(k) (6) c i(k) c i(k) + δc i(k) ; (7).7. update the centroid covariance matrix R i(k) for cluster C i(k) C i(k) R i(k) C i(k) R i(k) + δc i(k) δc i(k) + (8) 3. end for; 4. end. [ ] [ ] C i(k) x(k) ci(k) x(k) ci(k) ; Output: Estimated matrices A,..., A s, clusters C,..., C s. Algorithm requires an initial guess for the parameter matrices A i and cluster centroids c i, i =,..., s. Zero matrices A i, randomly chosen centroids c i, and identity centroid covariance matrices R i are a possible choice. In alternative, if Algorithm can be executed in a batch fashion on a subset of data, one can initialize the parameter matrices A,..., A s all equal to the best linear model A i arg min A N k= y(k) A [ x(k) ], i =,..., s (9) that fits all data, and classify the regressors {x(k)} N k= through k-means clustering [], to compute the cluster centroids and the centroid covariance matrices. Otherwise, the initial guess for the parameters could be obtained executing Algorithm without the first term in (6) and updating the cluster centroids and covariance matrices only once all the data have been classified. Clearly, the final estimate of the parameter matrices A,..., A s and of the clusters C,..., C s depends on their initial values. When working offline on
4 a batch of identification data, estimation quality may be improved by repeating Algorithm iteratively, by using its output as initial condition for its following execution. After computing the estimation error e i (k) for all models i at Step., Step. picks up the best mode i(k) to which the current sample x(k) must be associated with (the reader is referred to as [6] for a stochastic interpretation of the discrimination criterium (6) used to cluster the observed data points). Step.4 updates the parameter matrix A i(k) associated to the selected mode i(k) using the recursive least-squares algorithm of [] based on inverse QR factorization. Note that only the parameters of the matrix A i(k) associated to the selected mode i(k) are updated at time k, while the parameters associated to the other modes are not. Remark : In case the noise covariance matrix Λ e of noise e o cannot be determined a priori, a possible choice for Λ e is Λ e = I ny (i.e., the regression errors are not weighted in (6)). Alternatively, if Algorithm is executed in a batch fashion and iteratively repeated by using the output as initial condition for the next execution, an estimate ˆΛ e of Λ e can be computed at the end of each execution as ˆΛ e = N s N i= k= x(k) C i B. Partitioning the regressor space ( [ y(k) ]) ( [ Ai x(k) y(k) ]) Ai x(k). When one is interested in getting also the partition {X i } s i=, besides the affine models {A i} s i=, the clusters {C i } s i= must be separated via linear multicategory discrimination. The linear multicategory discrimination problem amounts to computing a convex piecewise affine separator function ϕ : R n x R discriminating between the clusters C,..., C s. The piecewise affine separator ϕ is defined as the maximum of s affine functions {ϕ i (x)} s i=, i.e., ϕ(x) = max ϕ i(x). (0) i=,...,s The affine functions ϕ i (x) are described by the parameters ω i R n x and γ i R, namely: [ ] ϕ i (x) = [ x ω ] i. () γ i For i =,..., s, let M i be a m i n x dimensional matrix (with m i denoting the cardinality of cluster C i ) obtained by stacking the regressors x(k) belonging to C i in its rows. According to [8], in case of linearly separable clusters, the piecewise-affine separator ϕ satisfies the conditions: [ Mi mi ] [ ω i γ i ] [ M i mi ] [ ω j γ j ] + mi, () i, j =,..., s, i j. ) Off-line multicategory discrimination: Rather than solving a robust linear programming (RLP) problem as in [8], the parameters {ω i, γ i } s i= are computed by solving the convex unconstrained optimization problem λ s ( min ω i + (γ i ) ) + (3) ξ i= s s ( [ ) [ M i mi ] ω j ω i + mi, i= j = j i m i γ j γ i ] with ξ = [ (ω )... (ω s ) γ... γ s ]. Problem (3) aims at minimizing the averaged squared -norm of the violation of the inequalities in (). The regularization parameter λ > 0 is introduced so that the objective function in (3) is strongly convex. Problem (3) can be efficiently solved via the Regularized Piecewise-Smooth Newton (RPSN) method explained in [6] and originally proposed in [5]. ) Recursive multicategory discrimination: As an alternative to the off-line approach to the multicategory discrimination, or in addition to it for refining the partition ϕ online based on streaming data, a recursive approach to solve problem (3) based on on-line convex programming can be used. Let us treat the data-points x R nx as random vectors and assume that an oracle function i : R n x : {,..., s} exists that, to any x R nx, assigns the corresponding mode i(x) {,..., s}. Function i implicitly defines clusters in the datapoint space R nx. Let us also assume that the following values π i = Prob[i(x) = i] = δ(i, i(x))p(x)dx R nx are known for all i =,..., s, where δ(i, j) = if i = j, zero otherwise. The multicategory discrimination problem (3) can be the generalized to the following convex regularized stochastic optimization problem ξ = min E x R nx [l(x, ξ)] + λ ξ ξ (4) s ( l(x, ξ) = x (ω j ω i(x) ) γ j + γ i(x) + π i(x) j = j i(x) where E x [ ] denotes the expected value w.r.t. x. Problem (4) aims at violating the least, on average over x, the condition in () for i = i(x). The coefficients π i can be a-priori estimated offline from a subset of data, namely π i = m i N, and possibly iteratively updated. Numerical experiments have shown however that constant and uniform coefficients π = s work equally well. The solution of (4), which provides the piecewise affine multicategory discrimination function ϕ, can be computed by online convex optimization, through the Averaged Stochastic Gradient Descent (ASGD) method described in [6]. IV. SIMULATION EXAMPLES In this section, we consider two numerical examples to show the application of the proposed PWA regression approach to the identification of PWARX and LPV systems. + ) +
5 All computations are carried out on an i7.40-ghz Intel core processor with 4 GB of RAM running MATLAB R04b. In both examples, the output used in the training phase is corrupted by an additive zero-mean white noise e o with Gaussian distribution. The effect of measurement noise on the output signal is quantified through the Signal-to-Noise Ratio (SNR), that is defined for the i-th output channel as N k= SNR i = 0 log (y i(k) e o,i (k)) N k= e o,i (k), (5) with e o,i (k) denoting the i-th component of e o (k). The results obtained after the training phase are validated on a noiseless data sequence. Let y o and ŷ denote, respectively, the vectors staking the actual and the simulated outputs of the estimated model, let ȳ o,i be the sample mean of the i-th output, and N V the length of the validation data sequence. The Best Fit Rate (BFR) and Mean Square Error (MSE) indicators { BFR i = max y } o,i ŷ i, 0 (6) y o,i ȳ o,i MSE i = N V N V (y o,i (k) ŷ i (k)) (7) k= defined for each output channel i, i =,..., n y, are used to assess the quality of the estimated models. A. Identification of a multivariable PWARX system Let the system generating the data be a MIMO PWARX system described by the difference equation [ ] y (k) y = [ ] [ ] y (k ) (k) y + [ ] [ ] u (k ) (k ) u (k ) { [ + [ ] + max ] [ ] y (k ) y (k ) [ + [ ] u(k ) u (k ) ] + [ ], [ 0 0 ] } + e o (k), which is characterized by s = 4 operating modes, given by the possible combinations of sign of the components of the first vector argument of the max operator. The excitation input u(k) is a white noise sequence with uniform distribution in the box [ 0] [ ] and length N = 4000, e o (k) R is a zero-mean white noise with covariance matrix Λ e = [ ]. This corresponds to signal-to-noise ratios equal to SNR = 8.7 db and SNR = 6.9 db on the first and second output channels, respectively. We run Algorithm with s = s = 4. The initial guess for the cluster centroids and covariance matrices are computed by running an instance of Algorithm without the first term in (6) (i.e., only the modeling error is used as a discrimination criterion to cluster the observed data points at the first run). Algorithm is then run again for 5 times, with the full criterion (6), by initializing A i, c i and R i with the output of the previous run. The clusters generated by Algorithm are then separated through the off-line multicategory discrimination method described in Section III-B., by solving the convex optimization problem (3) via the RPSN method TABLE I PWARX IDENTIFICATION: BFR AND MSE ON THE TWO OUTPUT CHANNELS BFR BFR MSE MSE 96.% 96.3% TABLE II PWARX IDENTIFICATION: BFR ON THE TWO OUTPUT CHANNELS VS BFR BFR LENGTH N OF THE TRAINING SET. N = 4000 N = 0000 N = (Off-line) RLP [8] 96.0 % 96.5 % 99.0 % (Off-line) RPSN 96. % 96.4 % 98.9 % (On-line) ASGD 86.7 % 95.0 % 96.7 % (Off-line) RLP [8] 96. % 96.9 % 99.0 % (Off-line) RPSN 96.3 % 96.8 % 99.0 % (On-line) ASGD 87.4 % 95. % 96.4 % TABLE III PWARX IDENTIFICATION: CPU TIME REQUIRED TO PARTITION THE REGRESSOR SPACE VS LENGTH N OF THE TRAINING SET. N = 4000 N = 0000 N = (Off-line) RLP [8] s 3.7 s.435 s (Off-line) RPSN 0.06 s s s (On-line) ASGD 0.03 s 0.03 s s described in [6], which is initialized with {ω i, γ i } s i= = 0. The regularization parameter λ is set to 0 5. The quality of the estimated PWA model is assessed w.r.t. a validation set of N V = 500 samples. The true output y o and the open-loop simulated output ŷ generated by the estimated model are plotted in Fig., along with the simulation error y o (k) ŷ(k). For the sake of visualization, only the samples from time 0 to 00 related to the first channel are shown in Fig.. The obtained BFR and MSE are reported in Table I. The estimated polyhedral partition of the regressor space is such that only out of 500 data samples are misclassified (i.e.,.4 % of the whole validation set). As the accuracy of the final model estimate and the total CPU time is influenced by the number M of runs of Algorithm, the performance of the proposed learning approach has been tested with respect to both M and the dimension N of the training set. The obtained BFR as a function of iterations of Algorithm is plotted in Fig. for different values of N. Clearly, there is no improvement in BFR on the two output channels after about 8 runs. The total CPU time for estimating the PWARX model is 0.76 s, of which 0.06 s are taken to solve the linear multi-category discrimination problem (3). For the sake of comparison, both the robust linear programming approach of [8] and the on-line method described in Section III-B. are also run to generate the partition. Results in Table II show that all of the three algorithms used to compute the polyhedral partition of the regressor space lead to an accurate estimate of the system, with BFRs larger then 95 % (except when N = 4000 and the on-line multi-category In executing the on-line approach in Section III-B., the weights π i and the initial guess of ϕ used by the stochastic gradient method are computed by executing the batch Algorithm in Section III-B. on the first 3000 training samples.
6 BFR.4. N=4000 N=0000 N= runs BFR.4. N=4000 N=0000 N= runs Fig.. PWARX identification: BFR on the first and on the second output channel vs number of runs of Algorithm discrimination method is used). Furthermore, the results show that the estimated models become more accurate as the number of training samples increases. The CPU times taken to compute the polyhedral partition are reported in Table III, which shows that, for a large training set (i.e., N = 00000), the off-line RPSN and the on-line ASGD method are about 300x and 600x faster, respectively, than the robust linear programming method of [8]. B. Identification of an LPV system Let the data be collected from the MIMO LPV system [ ] [ ] [ ] y (k) ā, (p(k)) ā y (k) =, (p(k)) y (k ) y (k ) ] [ ] + u(k ) u (k ) + e o (k), where ā, (p(k)) = ā, (p(k)) ā, (p(k)) [ b, (p(k)) b, (p(k)) b, (p(k)) b, (p(k)) 0.3 if 0.4 (p (k) + p (k)) 0.3, 0.3 if 0.4 (p (k) + p (k)) 0.3, 0.4 (p (k) + p (k)) otherwise, ā, (p(k)) = 0.5 ( p (k) + p (k) ), ā, (p(k)) = p (k) p (k), 0.5 if p (k) < 0, ā, (p(k)) = 0 if p (k) = 0, 0.5 if p (k) > 0, b, (p(k)) = 3p (k) + p (k), b, (p(k)) = { 0.5 if ( p (k) + p (k) ) 0.5, ( p (k) + p (k) ) otherwise, b, (p(k)) = sin {p (k) p (k)}, b, (p(k)) = 0. (8) Both the input u(k) and the scheduling vector p(k) are white noise sequences (independent of each other) of length N = 000 with uniform distribution in the boxes [ ] [ ] and [ ] [ ], respectively. The noise covariance matrix of e o (k) R is Λ e = [ ]. This corresponds to signal-to-noise ratios on the first and on the second output channel equal to SNR = 4 db and SNR = 7 db, respectively. Fig. 3. LPV identification: polyhedral partition of the scheduling vector space P The goal is to estimate, from the gathered data, a PWA approximation of the p-dependent nonlinear functions ā i,j and b i,j defining the behaviour of the LPV data-generating system (8). ) Choice of the number of modes: The number s of polyhedral regions defining the partition of the scheduling vector space P = [ ] [ ] is chosen through cross validation. Specifically, the 000-length training data set is split into two disjoint sets. The first 0000 samples are used to estimate a PWA approximation of ā i,j and b i,j, along with the polyhedral partition of the scheduling vector space P, for different values of s in the range For each value of s, the identification Algorithm is run 0 times. The second part of training data (i.e., the remaining 000 samples) is used to assess the quality of the identified LPV models. For each value of s, the BFR on the two output channels is computed. Among the identified LPV models, the one providing the largest aggregated BFR T = BFR + BFR is selected, which corresponds to s = 0 polyhedral regions. The computed polyhedral partition, obtained by solving problem (3) through the regularized piecewisesmooth Newton method explained in [6], is plotted in Fig. 3 (the Hybrid Toolbox for MATLAB [4] has been used to plot the polytopes in Fig. 3). ) Model quality assessment: The quality of the estimated LPV model is then assessed w.r.t. a validation dataset, consisting of a new sequence of N V = 000 noiseless samples used neither to estimate the LPV model nor to select the number of modes s. For the sake of comparison, the nonlinear coefficient functions ā i,j (p(k)) and b i,j (p(k)) are also estimated through the parametric LPV identification approach proposed in [3], by parameterizing the nonlinear functions ā i,j (p(k)) and b i,j (p(k)) as fourth-order polynomials in the two-dimensional scheduling vector p(k). The true outputs y o and the simulated output sequences ŷ of the estimated LPV models are plotted in Fig. 4, along with the simulation error y o (k) ŷ(k). For the sake of visualization, only the samples from time 0 to 00 related to the second channel are reported. The BFR and MSE on the
7 yo,ŷ time (samples) (a) First output channel (output signal): black = true, red = estimated Fig.. yo ŷ time (samples) (b) First output channel (simulation error) PWARX identification: output signal and simulation error on the first output channel TABLE IV LPV IDENTIFICATION: BFR AND MSE OF THE LPV MODELS ESTIMATED BY USING THE PROPOSED PWA REGRESSION APPROACH AND THE PARAMETRIC LPV IDENTIFICATION APPROACH OF [3] BFR BFR MSE MSE PWA regression 87 % 84 % parametric LPV [3] 80 % 70 % two output channels are reported in Table IV. The obtained results show that the proposed LPV identification approach based on the PWA approximation of the coefficient functions ā i,j (p(k)) and b i,j (p(k))) outperforms the parametric LPV identification approach in [3]. We also remark that the online computational time required to evaluate the output of the LPV model, given the current value of the scheduling vector p and the past input/output observations is about 0 µs, 40 µs of which are required to evaluate which region the current scheduling vector belongs to. Note that the latter step requires to compute the maximum of s = 0 affine functions {ϕ i ( p)} s i= defining the piecewise affine separator ϕ( p) in (0). This relatively low online computational time is mainly due to the PWA structure of the coefficient functions describing the LPV model, and it allows to use the estimated LPV model in applications requiring a fast online determination of the operating mode, such as in gain scheduling or in LPV model predictive control. 3) Performance of multi-category discrimination algorithms: The CPU time required to estimate the LPV model through the proposed PWA regression approach is 759 s. This includes the cross-validation phase to compute the number of modes s. For s = 0, the CPU time required to compute the LPV model is 4 s, 0.4 s of which are spent to compute the polyhedral partition via problem (3) (the robust linear programming multicategorical discrimination algorithm of [8] takes 4. seconds, i.e., almost 0x slower). For a more exhaustive comparison between the regularized piecewise-smooth Newton approach used to solve problem (3) and the robust linear programming algorithm of [8], the CPU time required by the two algorithms to partition the scheduling parameter space is plotted, as a function of s, in Fig. 5. Fig. 5 also shows the CPU time required by the averaged stochastic gradient descent algorithm in [6] to compute the solution of problem (4). The weights π i and the initial estimate used by the averaged stochastic gradient descent algorithm are computed by solving problem (3) on the first 000 training samples. The remaining 9000 training samples are processed recursively. The regularization parameter λ in problems (3) and (4) is set to 0 5. In order to test the performance of the three multicategory discrimination algorithms in terms of model accuracy, the aggregate best fit rate BFR T obtained by using the three algorithms is plotted, as a function of s, in Fig. 6. Results in Figs. 5 and 6 show that: the CPU time required by all of the three discrimination algorithms to partition the scheduling vector space increases with the number of modes s (Fig. 5). This is due to the fact that the number of parameters ξ defining the piecewise affine separator ϕ(x) in (0) increases linearly with s; the (offline) regularized piecewise-smooth Newton method and the (online) average stochastic gradient method used to solve problem (3) and (4), respectively, are faster (from 6x to 0x) than the robust linear programming based approach of [8]. in terms of model accuracy, the robust linear programming approach of [8] and the regularized piecewisesmooth Newton method achieve similar performance (Fig. 6), while, for s =, s = 4 and s = 0, the averaged stochastic gradient descent algorithm does not provide an accurate partition of the scheduling vector space, leading to LPV models with an aggregate best fit rates smaller than.. This means that, for s =, 4, 0 the solution of the averaged stochastic gradient descent algorithm fails to converge to the batch solution of problem (3) when only N = 0000 training samples are used. time [s] 0 5 RLP [8] RPSN ASGD s Fig. 5. LPV identification: CPU time required to partition the scheduling vector space vs number of modes (s).
8 yo,ŷ 0 yo ŷ time (samples) (a) Second output channel (output signal): black = true, red = PWA regression, green = polynomial parameterization time (samples) (b) Second output channel (simulation error): red = PWA regression, green = polynomial parameterization Fig. 4. LPV model estimate: output signal and simulation error on the second output channel. Estimate obtained through the PWA regression approach (red lines); estimate obtained through the parametric approach [3] (green lines) V. CONCLUSIONS In this paper we have reviewed the PWA regression algorithm introduced in [6], and discussed its application to the identification of PWARX and LPV dynamical systems. The approach consists of two stages: first, simultaneously cluster the observed regressors and estimate an affine submodel for each cluster; second, solve a convex unconstrained multicategory discrimination problem to determine a piecewise linear separator function and a polyhedral partition of the regressor space. The regularized piecewise-smooth Newton method used to compute the polyhedral partition of the regressor space makes the proposed regression algorithm computationally effective for offline learning, while the averaged stochastic gradient descent allows to process the data samples recursively (i.e., one data sample at the time), so that the overall algorithm can be implemented in real-time, such as for adaptive control purposes. Future research includes the extension of the PWA regression algorithm presented in the paper to the identification of hybrid and LPV systems under different noise conditions (e.g., Output-Error and Box-Jenkins model structures), and the generalization to piecewise-nonlinear models (such as piecewise polynomial) by feeding regression data manipulated through nonlinear basis functions to Algorithm and to the numerical algorithms used to solve multicategory discrimination problems. REFERENCES [] S.T. Alexander and A.L. Ghirnikar. A method for recursive least squares filtering based upon an inverse QR decomposition. IEEE Trans. Signal Processing, 4():0 30, 993. BFR T RLP [8] RPSN ASGD s Fig. 6. LPV identification: aggregate best fit rate BFR T vs number of modes (s). [] L. Bako, K. Boukharouba, E. Duviella, and S. Lecoeuche. A recursive identification algorithm for switched linear/affine models. Nonlinear Analysis: Hybrid Systems, 5():4 53, 0. [3] B. A. Bamieh and L. Giarré. Identification of linear parametervarying models. International Journal of Robust Nonlinear Control, (9):84 853, 00. [4] A. Bemporad. Hybrid Toolbox - User s Guide, 004. url: bemporad/hybrid/toolbox. [5] A. Bemporad, D. Bernardini, and P. Patrinos. A convex feasibility approach to anytime model predictive control. Technical report, IMT Institute for Advanced Studies, Lucca, February 05. http: //arxiv.org/abs/ [6] A. Bemporad, V. Breschi, and D. Piga. Piecewise affine regression via recursive multiple least squares and multicategory discrimination. Technical report, IMT Institute for Advanced Studies, Lucca, October 05. Submitted to Automatica. Available at: dariopiga.com/tr/tr_pwareg_bbp_05.pdf. [7] A. Bemporad, A. Garulli, S. Paoletti, and A. Vicino. A bounded-error approach to piecewise affine system identification. IEEE Transactions on Automatic Control, 50(0): , 005. [8] K.P. Bennett and O.L. Mangasarian. Multicategory discrimination via linear programming. Optimization Methods and Software, 3:7 39, 994. [9] G. Ferrari-Trecate, M. Muselli, D. Liberati, and M. Morari. A clustering technique for the identification of piecewise affine systems. Automatica, 39():05 7, 003. [0] A. L. Juloski, S. Weiland, and W. P. M. H. Heemels. A bayesian approach to identification of hybrid systems. IEEE Transactions on Automatic Control, 50(0):50 533, 005. [] A. Likas, N. Vlassis, and J. Verbeek. The global k-means clustering algorithm. Pattern recognition, 36():45 46, 003. [] H. Nakada, K. Takaba, and T. Katayama. Identification of piecewise affine systems based on statistical clustering technique. Automatica, 4(5):905 93, 005. [3] H. Ohlsson and L. Ljung. Identification of switched linear regression models using sum-of-norms regularization. Automatica, 49(4): , 03. [4] N. Ozay, C. Lagoa, and M. Sznaier. Set membership identification of switched linear systems with known number of subsystems. Automatica, 5:80 9, 05. [5] S. Paoletti, A. L. Juloski, G. Ferrari-Trecate, and R. Vidal. Identification of hybrid systems a tutorial. European journal of control, 3():4 60, 007. [6] D. Piga, P. Cox, R. Tóth, and V. Laurain. LPV system identification under noise corrupted scheduling and output signal observations. Automatica, 53:39 338, 05. [7] J. Roll, A. Bemporad, and L. Ljung. Identification of piecewise affine systems via mixed-integer programming. Automatica, 40():37 50, 004. [8] R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 67 88, 996. [9] V. Vapnik. Statistical learning theory. Wiley New York, 998.
A Greedy Approach to Identification of Piecewise Affine Models
A Greedy Approach to Identification of Piecewise Affine Models Alberto Bemporad, Andrea Garulli, Simone Paoletti, and Antonio Vicino Università di Siena, Dipartimento di Ingegneria dell Informazione, Via
More informationA new hybrid system identification algorithm with automatic tuning
Proceedings of the 7th World Congress The International Federation of Automatic Control A new hybrid system identification algorithm with automatic tuning Fabien Lauer and Gérard Bloch Centre de Recherche
More informationA Learning Algorithm for Piecewise Linear Regression
A Learning Algorithm for Piecewise Linear Regression Giancarlo Ferrari-Trecate 1, arco uselli 2, Diego Liberati 3, anfred orari 1 1 nstitute für Automatik, ETHZ - ETL CH 8092 Zürich, Switzerland 2 stituto
More information634 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 57, NO. 3, MARCH 2012
634 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 57, NO. 3, MARCH 2012 A Sparsification Approach to Set Membership Identification of Switched Affine Systems Necmiye Ozay, Member, IEEE, Mario Sznaier, Member,
More informationI How does the formulation (5) serve the purpose of the composite parameterization
Supplemental Material to Identifying Alzheimer s Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis I How does the formulation (5)
More informationA Clustering Technique for the Identification of Piecewise Affine Systems
A Clustering Technique for the Identification of Piecewise Affine Systems Giancarlo Ferrari-Trecate 1, Marco Muselli 2, Diego Liberati 2, and Manfred Morari 1 1 Institut für Automatik, ETH - Swiss Federal
More informationMultiresponse Sparse Regression with Application to Multidimensional Scaling
Multiresponse Sparse Regression with Application to Multidimensional Scaling Timo Similä and Jarkko Tikka Helsinki University of Technology, Laboratory of Computer and Information Science P.O. Box 54,
More informationSupervised vs. Unsupervised Learning
Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now
More informationLinear Methods for Regression and Shrinkage Methods
Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors
More informationA continuous optimization framework for hybrid system identification
A continuous optimization framework for hybrid system identification Fabien Lauer, Gérard Bloch, René Vidal To cite this version: Fabien Lauer, Gérard Bloch, René Vidal. A continuous optimization framework
More informationCombine the PA Algorithm with a Proximal Classifier
Combine the Passive and Aggressive Algorithm with a Proximal Classifier Yuh-Jye Lee Joint work with Y.-C. Tseng Dept. of Computer Science & Information Engineering TaiwanTech. Dept. of Statistics@NCKU
More informationRELATIVELY OPTIMAL CONTROL: THE STATIC SOLUTION
RELATIVELY OPTIMAL CONTROL: THE STATIC SOLUTION Franco Blanchini,1 Felice Andrea Pellegrino Dipartimento di Matematica e Informatica Università di Udine via delle Scienze, 208 33100, Udine, Italy blanchini@uniud.it,
More informationNeural Networks. Theory And Practice. Marco Del Vecchio 19/07/2017. Warwick Manufacturing Group University of Warwick
Neural Networks Theory And Practice Marco Del Vecchio marco@delvecchiomarco.com Warwick Manufacturing Group University of Warwick 19/07/2017 Outline I 1 Introduction 2 Linear Regression Models 3 Linear
More informationMixture Models and the EM Algorithm
Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is
More informationSemi-Supervised Clustering with Partial Background Information
Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject
More informationMixture Models and EM
Mixture Models and EM Goal: Introduction to probabilistic mixture models and the expectationmaximization (EM) algorithm. Motivation: simultaneous fitting of multiple model instances unsupervised clustering
More informationLinear Machine: A Novel Approach to Point Location Problem
Preprints of the 10th IFAC International Symposium on Dynamics and Control of Process Systems The International Federation of Automatic Control Linear Machine: A Novel Approach to Point Location Problem
More informationLocally Weighted Least Squares Regression for Image Denoising, Reconstruction and Up-sampling
Locally Weighted Least Squares Regression for Image Denoising, Reconstruction and Up-sampling Moritz Baecher May 15, 29 1 Introduction Edge-preserving smoothing and super-resolution are classic and important
More information1. Introduction. performance of numerical methods. complexity bounds. structural convex optimization. course goals and topics
1. Introduction EE 546, Univ of Washington, Spring 2016 performance of numerical methods complexity bounds structural convex optimization course goals and topics 1 1 Some course info Welcome to EE 546!
More informationPerformance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018
Performance Estimation and Regularization Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Bias- Variance Tradeoff Fundamental to machine learning approaches Bias- Variance Tradeoff Error due to Bias:
More informationModule 1 Lecture Notes 2. Optimization Problem and Model Formulation
Optimization Methods: Introduction and Basic concepts 1 Module 1 Lecture Notes 2 Optimization Problem and Model Formulation Introduction In the previous lecture we studied the evolution of optimization
More informationRecent Developments in Model-based Derivative-free Optimization
Recent Developments in Model-based Derivative-free Optimization Seppo Pulkkinen April 23, 2010 Introduction Problem definition The problem we are considering is a nonlinear optimization problem with constraints:
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University September 20 2018 Review Solution for multiple linear regression can be computed in closed form
More informationGAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.
GAMs semi-parametric GLMs Simon Wood Mathematical Sciences, University of Bath, U.K. Generalized linear models, GLM 1. A GLM models a univariate response, y i as g{e(y i )} = X i β where y i Exponential
More informationMachine Learning: Think Big and Parallel
Day 1 Inderjit S. Dhillon Dept of Computer Science UT Austin CS395T: Topics in Multicore Programming Oct 1, 2013 Outline Scikit-learn: Machine Learning in Python Supervised Learning day1 Regression: Least
More informationInstance-based Learning
Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 19 th, 2007 2005-2007 Carlos Guestrin 1 Why not just use Linear Regression? 2005-2007 Carlos Guestrin
More informationCS 229 Midterm Review
CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask
More informationCS 559: Machine Learning Fundamentals and Applications 9 th Set of Notes
1 CS 559: Machine Learning Fundamentals and Applications 9 th Set of Notes Instructor: Philippos Mordohai Webpage: www.cs.stevens.edu/~mordohai E-mail: Philippos.Mordohai@stevens.edu Office: Lieb 215 Overview
More informationPerceptron as a graph
Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 10 th, 2007 2005-2007 Carlos Guestrin 1 Perceptron as a graph 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-6 -4-2
More informationContents. I Basics 1. Copyright by SIAM. Unauthorized reproduction of this article is prohibited.
page v Preface xiii I Basics 1 1 Optimization Models 3 1.1 Introduction... 3 1.2 Optimization: An Informal Introduction... 4 1.3 Linear Equations... 7 1.4 Linear Optimization... 10 Exercises... 12 1.5
More informationGenerating the Reduced Set by Systematic Sampling
Generating the Reduced Set by Systematic Sampling Chien-Chung Chang and Yuh-Jye Lee Email: {D9115009, yuh-jye}@mail.ntust.edu.tw Department of Computer Science and Information Engineering National Taiwan
More information4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.
1 4.12 Generalization In back-propagation learning, as many training examples as possible are typically used. It is hoped that the network so designed generalizes well. A network generalizes well when
More informationELEG Compressive Sensing and Sparse Signal Representations
ELEG 867 - Compressive Sensing and Sparse Signal Representations Gonzalo R. Arce Depart. of Electrical and Computer Engineering University of Delaware Fall 211 Compressive Sensing G. Arce Fall, 211 1 /
More informationBilevel Sparse Coding
Adobe Research 345 Park Ave, San Jose, CA Mar 15, 2013 Outline 1 2 The learning model The learning algorithm 3 4 Sparse Modeling Many types of sensory data, e.g., images and audio, are in high-dimensional
More informationConvexization in Markov Chain Monte Carlo
in Markov Chain Monte Carlo 1 IBM T. J. Watson Yorktown Heights, NY 2 Department of Aerospace Engineering Technion, Israel August 23, 2011 Problem Statement MCMC processes in general are governed by non
More informationFMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)
More informationClustering. Supervised vs. Unsupervised Learning
Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now
More informationSoftware Documentation of the Potential Support Vector Machine
Software Documentation of the Potential Support Vector Machine Tilman Knebel and Sepp Hochreiter Department of Electrical Engineering and Computer Science Technische Universität Berlin 10587 Berlin, Germany
More informationClassification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University
Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate
More informationClustering and Visualisation of Data
Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some
More informationControlling Hybrid Systems
Controlling Hybrid Systems From Theory to Application Manfred Morari M. Baotic, F. Christophersen, T. Geyer, P. Grieder, M. Kvasnica, G. Papafotiou Lessons learned from a decade of Hybrid System Research
More informationNotes on Robust Estimation David J. Fleet Allan Jepson March 30, 005 Robust Estimataion. The field of robust statistics [3, 4] is concerned with estimation problems in which the data contains gross errors,
More informationUnderstanding Clustering Supervising the unsupervised
Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data
More informationAn Optimal Regression Algorithm for Piecewise Functions Expressed as Object-Oriented Programs
2010 Ninth International Conference on Machine Learning and Applications An Optimal Regression Algorithm for Piecewise Functions Expressed as Object-Oriented Programs Juan Luo Department of Computer Science
More informationCSE 547: Machine Learning for Big Data Spring Problem Set 2. Please read the homework submission policies.
CSE 547: Machine Learning for Big Data Spring 2019 Problem Set 2 Please read the homework submission policies. 1 Principal Component Analysis and Reconstruction (25 points) Let s do PCA and reconstruct
More information6. Linear Discriminant Functions
6. Linear Discriminant Functions Linear Discriminant Functions Assumption: we know the proper forms for the discriminant functions, and use the samples to estimate the values of parameters of the classifier
More informationParameter Estimation and Model Order Identification of LTI Systems
Preprint, 11th IFAC Symposium on Dynamics and Control of Process Systems, including Biosystems Parameter Estimation and Model Order Identification of LTI Systems Santhosh Kumar Varanasi, Phanindra Jampana
More informationClustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford
Department of Engineering Science University of Oxford January 27, 2017 Many datasets consist of multiple heterogeneous subsets. Cluster analysis: Given an unlabelled data, want algorithms that automatically
More informationLecture 13: Model selection and regularization
Lecture 13: Model selection and regularization Reading: Sections 6.1-6.2.1 STATS 202: Data mining and analysis October 23, 2017 1 / 17 What do we know so far In linear regression, adding predictors always
More informationIntelligent Methods in Modelling and Simulation of Complex Systems
SNE O V E R V I E W N OTE Intelligent Methods in Modelling and Simulation of Complex Systems Esko K. Juuso * Control Engineering Laboratory Department of Process and Environmental Engineering, P.O.Box
More information6 Model selection and kernels
6. Bias-Variance Dilemma Esercizio 6. While you fit a Linear Model to your data set. You are thinking about changing the Linear Model to a Quadratic one (i.e., a Linear Model with quadratic features φ(x)
More informationPrecise Piecewise Affine Models from Input-Output Data
Precise Piecewise Affine Models from Input-Output Data Rajeev Alur, Nimit Singhania University of Pennsylvania ABSTRACT Formal design and analysis of embedded control software relies on mathematical models
More informationProgramming, numerics and optimization
Programming, numerics and optimization Lecture C-4: Constrained optimization Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428 June
More informationClustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin
Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014
More informationGenerative and discriminative classification techniques
Generative and discriminative classification techniques Machine Learning and Category Representation 013-014 Jakob Verbeek, December 13+0, 013 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.13.14
More informationPart 4. Decomposition Algorithms Dantzig-Wolf Decomposition Algorithm
In the name of God Part 4. 4.1. Dantzig-Wolf Decomposition Algorithm Spring 2010 Instructor: Dr. Masoud Yaghini Introduction Introduction Real world linear programs having thousands of rows and columns.
More information1. Lecture notes on bipartite matching
Massachusetts Institute of Technology 18.453: Combinatorial Optimization Michel X. Goemans February 5, 2017 1. Lecture notes on bipartite matching Matching problems are among the fundamental problems in
More informationLecture 19: November 5
0-725/36-725: Convex Optimization Fall 205 Lecturer: Ryan Tibshirani Lecture 9: November 5 Scribes: Hyun Ah Song Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not
More informationChapter 7: Numerical Prediction
Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Chapter 7: Numerical Prediction Lecture: Prof. Dr.
More informationAlternating Direction Method of Multipliers
Alternating Direction Method of Multipliers CS 584: Big Data Analytics Material adapted from Stephen Boyd (https://web.stanford.edu/~boyd/papers/pdf/admm_slides.pdf) & Ryan Tibshirani (http://stat.cmu.edu/~ryantibs/convexopt/lectures/21-dual-meth.pdf)
More informationData Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)
Data Mining: Concepts and Techniques Chapter 9 Classification: Support Vector Machines 1 Support Vector Machines (SVMs) SVMs are a set of related supervised learning methods used for classification Based
More informationThe exam is closed book, closed notes except your one-page (two-sided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or
More informationClustering. CS294 Practical Machine Learning Junming Yin 10/09/06
Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,
More informationPATTERN CLASSIFICATION AND SCENE ANALYSIS
PATTERN CLASSIFICATION AND SCENE ANALYSIS RICHARD O. DUDA PETER E. HART Stanford Research Institute, Menlo Park, California A WILEY-INTERSCIENCE PUBLICATION JOHN WILEY & SONS New York Chichester Brisbane
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationCluster Analysis. Ying Shen, SSE, Tongji University
Cluster Analysis Ying Shen, SSE, Tongji University Cluster analysis Cluster analysis groups data objects based only on the attributes in the data. The main objective is that The objects within a group
More informationCS294-1 Assignment 2 Report
CS294-1 Assignment 2 Report Keling Chen and Huasha Zhao February 24, 2012 1 Introduction The goal of this homework is to predict a users numeric rating for a book from the text of the user s review. The
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationClassification: Linear Discriminant Functions
Classification: Linear Discriminant Functions CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Discriminant functions Linear Discriminant functions
More informationThe fastclime Package for Linear Programming and Large-Scale Precision Matrix Estimation in R
Journal of Machine Learning Research (2013) Submitted ; Published The fastclime Package for Linear Programming and Large-Scale Precision Matrix Estimation in R Haotian Pang Han Liu Robert Vanderbei Princeton
More information3 Nonlinear Regression
CSC 4 / CSC D / CSC C 3 Sometimes linear models are not sufficient to capture the real-world phenomena, and thus nonlinear models are necessary. In regression, all such models will have the same basic
More information5 Machine Learning Abstractions and Numerical Optimization
Machine Learning Abstractions and Numerical Optimization 25 5 Machine Learning Abstractions and Numerical Optimization ML ABSTRACTIONS [some meta comments on machine learning] [When you write a large computer
More informationBig Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1
Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that
More informationRobust Kernel Methods in Clustering and Dimensionality Reduction Problems
Robust Kernel Methods in Clustering and Dimensionality Reduction Problems Jian Guo, Debadyuti Roy, Jing Wang University of Michigan, Department of Statistics Introduction In this report we propose robust
More informationAn algorithm for censored quantile regressions. Abstract
An algorithm for censored quantile regressions Thanasis Stengos University of Guelph Dianqin Wang University of Guelph Abstract In this paper, we present an algorithm for Censored Quantile Regression (CQR)
More information3D Surface Recovery via Deterministic Annealing based Piecewise Linear Surface Fitting Algorithm
3D Surface Recovery via Deterministic Annealing based Piecewise Linear Surface Fitting Algorithm Bing Han, Chris Paulson, and Dapeng Wu Department of Electrical and Computer Engineering University of Florida
More informationCluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008
Cluster Analysis Jia Li Department of Statistics Penn State University Summer School in Statistics for Astronomers IV June 9-1, 8 1 Clustering A basic tool in data mining/pattern recognition: Divide a
More informationAll lecture slides will be available at CSC2515_Winter15.html
CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 9: Support Vector Machines All lecture slides will be available at http://www.cs.toronto.edu/~urtasun/courses/csc2515/ CSC2515_Winter15.html Many
More informationPattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition
Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant
More informationComputation of the Constrained Infinite Time Linear Quadratic Optimal Control Problem
Computation of the Constrained Infinite Time Linear Quadratic Optimal Control Problem July 5, Introduction Abstract Problem Statement and Properties In this paper we will consider discrete-time linear
More informationNeural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:
More informationOptimization. Industrial AI Lab.
Optimization Industrial AI Lab. Optimization An important tool in 1) Engineering problem solving and 2) Decision science People optimize Nature optimizes 2 Optimization People optimize (source: http://nautil.us/blog/to-save-drowning-people-ask-yourself-what-would-light-do)
More informationMachine Learning / Jan 27, 2010
Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,
More informationUnlabeled Data Classification by Support Vector Machines
Unlabeled Data Classification by Support Vector Machines Glenn Fung & Olvi L. Mangasarian University of Wisconsin Madison www.cs.wisc.edu/ olvi www.cs.wisc.edu/ gfung The General Problem Given: Points
More informationCOMS 4771 Support Vector Machines. Nakul Verma
COMS 4771 Support Vector Machines Nakul Verma Last time Decision boundaries for classification Linear decision boundary (linear classification) The Perceptron algorithm Mistake bound for the perceptron
More information732A54/TDDE31 Big Data Analytics
732A54/TDDE31 Big Data Analytics Lecture 10: Machine Learning with MapReduce Jose M. Peña IDA, Linköping University, Sweden 1/27 Contents MapReduce Framework Machine Learning with MapReduce Neural Networks
More informationInteger Programming Theory
Integer Programming Theory Laura Galli October 24, 2016 In the following we assume all functions are linear, hence we often drop the term linear. In discrete optimization, we seek to find a solution x
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 24 2019 Logistics HW 1 is due on Friday 01/25 Project proposal: due Feb 21 1 page description
More informationTHE preceding chapters were all devoted to the analysis of images and signals which
Chapter 5 Segmentation of Color, Texture, and Orientation Images THE preceding chapters were all devoted to the analysis of images and signals which take values in IR. It is often necessary, however, to
More information3 Feature Selection & Feature Extraction
3 Feature Selection & Feature Extraction Overview: 3.1 Introduction 3.2 Feature Extraction 3.3 Feature Selection 3.3.1 Max-Dependency, Max-Relevance, Min-Redundancy 3.3.2 Relevance Filter 3.3.3 Redundancy
More informationAdvanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs
Advanced Operations Research Techniques IE316 Quiz 1 Review Dr. Ted Ralphs IE316 Quiz 1 Review 1 Reading for The Quiz Material covered in detail in lecture. 1.1, 1.4, 2.1-2.6, 3.1-3.3, 3.5 Background material
More informationComputational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions
Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions Thomas Giraud Simon Chabot October 12, 2013 Contents 1 Discriminant analysis 3 1.1 Main idea................................
More informationAlgebraic Iterative Methods for Computed Tomography
Algebraic Iterative Methods for Computed Tomography Per Christian Hansen DTU Compute Department of Applied Mathematics and Computer Science Technical University of Denmark Per Christian Hansen Algebraic
More informationPerceptron: This is convolution!
Perceptron: This is convolution! v v v Shared weights v Filter = local perceptron. Also called kernel. By pooling responses at different locations, we gain robustness to the exact spatial location of image
More informationThe exam is closed book, closed notes except your one-page (two-sided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or
More informationSupport Vector Machines.
Support Vector Machines srihari@buffalo.edu SVM Discussion Overview 1. Overview of SVMs 2. Margin Geometry 3. SVM Optimization 4. Overlapping Distributions 5. Relationship to Logistic Regression 6. Dealing
More information3 Nonlinear Regression
3 Linear models are often insufficient to capture the real-world phenomena. That is, the relation between the inputs and the outputs we want to be able to predict are not linear. As a consequence, nonlinear
More informationMachine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme
Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany
More informationLouis Fourrier Fabien Gaie Thomas Rolf
CS 229 Stay Alert! The Ford Challenge Louis Fourrier Fabien Gaie Thomas Rolf Louis Fourrier Fabien Gaie Thomas Rolf 1. Problem description a. Goal Our final project is a recent Kaggle competition submitted
More information