Mixed Linear System Estimation and Identification

48th IEEE Conference on Decson and Control, Shangha, Chna, December 2009 Mxed Lnear System Estmaton and Identfcaton A. Zymns S. Boyd D. Gornevsky Abstract We consder a mxed lnear system model, wth both contnuous and dscrete nputs and outputs, descrbed by a coeffcent matrx and a set of nose varances. When the dscrete nputs and outputs are absent, the model reduces to the usual nose-corrupted lnear system. Wth dscrete nputs only, the model has been used n fault estmaton, and wth dscrete outputs only, the system reduces to a probt model. We consder two fundamental problems: Estmatng the model nput, gven the model parameters and the model output; and dentfyng the model parameters, gven a tranng set of nput-output pars. The estmaton problem leads to a mxed Boolean-convex optmzaton problem, whch can be solved exactly when the number of dscrete varables s small enough. In other cases the estmaton problem can be solved approxmately, by solvng a convex relaxaton, roundng, and possbly, carryng out a local optmzaton step. The dentfcaton problem s convex and so can be exactly solved. Addng l 1 regularzaton to the dentfcaton problem allows us to trade off model ft and model parsmony. We llustrate the dentfcaton and estmaton methods wth a numercal example. A. System model I. INTRODUCTION In ths paper we ntroduce a model wth multple contnuous (.e., real valued) and dscrete (.e., Boolean) nputs and outputs. The contnuous outputs are lnearly related to the contnuous and dscrete nputs and are corrupted by zero mean Gaussan nose of known varance. The dscrete outputs ndcate whether a lnear combnaton of the contnuous and dscrete nputs, corrupted by zero mean Gaussan nose, s above or below a known threshold. As such, the model that we consder s a hybrd generalzed lnear model (GLM) [1], [2], [3]. The model has the form where y = A cc x + A dc d + b c + v c, (1) z = pos(a cd x + A dd d + b d + v d ), (2) Ths materal s based upon work supported by the Focus Center for Crcut & System Solutons (C2S2), by JPL award I291856, by the Precourt Insttute on Energy Effcency, by Army award W911NF- 07-1-0029, by NSF award 0529426, by DARPA award N66001-06-C- 2021, by NASA award NNX07AEIIA, by AFOSR award FA9550-06- 1-0514, and by AFOSR award FA9550-06-1-0312. The authors are wth the Department of Electrcal Engneerng, Stanford Unversty (e-mal: {azymns, boyd, gorn}@stanford.edu). x R n c s the contnuous nput, d {0, 1} n d s the dscrete nput, y R mc, s the contnuous measurement or output, z {0, 1} m d s the dscrete measurement or output, v c R m c s the contnuous nose term, and v d R m d s the dscrete nose term. The functon pos : R m d {0, 1} m d s defned by { 1, u > 0 pos(u) = 0, u 0. Note that for any dagonal postve matrx D, we have pos(du) = pos(u). The noses are Gaussan, wth all components ndependent, wth (v c ) N (0, σ 2 ), = 1,..., n c, (v d ) N (0, 1), = 1,..., n d. (We can assume the dscrete nose components have unt varance wthout loss of generalty, usng a postve dagonal scalng.) The model s defned by the matrces A cc, A dc, A cd, and A dd, the ntercept terms b c and b d, and the contnuous nose varances σ 2, = 1,..., n c. The model (1) (2) ncludes several well known and wdely used specal cases. For m d = 0 and n d = 0 we have a smple lnear model wth addtve Gaussan nose. For n c = 0 and n d = 1, we obtan a probt model [1]. We menton several applcatons n I-D. In ths paper we address two basc problems assocated wth ths model: estmaton and dentfcaton. B. Estmaton We frst look at the problem of estmatng the model nputs x and d, gven one or more output samples. The pror dstrbuton on the nputs s specfed by a densty p(x) for x, whch we assume s log-concave, and the probablty p j that d j = 1. (We assume that x and all d j are ndependent.) We wll see that the maxmum a posteror (MAP) estmate of (x, d) s the soluton of a mxed Boolean convex problem. If n d s small enough, ths problem can be solved exactly, by exhaustve enumeraton of the possble values of d, or by a branch-and-bound or other global optmzaton method. For other cases, we propose to solve the optmzaton

problem approxmately, by solvng a convex relaxaton, and roundng the result, possbly followed by a local optmzaton step. We refer to the resultng estmate as the RMAP ( relaxed MAP ) estmate. Numercal smulaton suggests that the estmaton performance of the RMAP estmator s qute smlar to the MAP estmate; unlke the MAP estmate, however, t s computatonally tractable even for very large problems [4]. C. Identfcaton We then look at the dual problem of fttng a model of the form (1) (2),.e., determnng values for the model parameters, gven a gven set of (tranng) data samples (x (1), d (1), y (1), z (1) ),..., (x (K), d (K), y (K), z (K) ). We show that the assocated log-lkelhood functon s a concave functon of the model parameters, so maxmum lkelhood (ML) model fttng reduces to solvng a convex optmzaton problem. To obtan a parsmonous model,.e., one n whch many of the entres of the parameter matrces are zero, we propose l 1 -regularzed ML fttng (whch s also a convex problem). By varyng a regularzaton parameter, we can trade off model ft and model parsmony [5], [6], [7], [8]. D. Pror and related work The model that we consder s n essence a hybrd generalzed lnear model (GLM). These models have been extensvely used for explanatory modellng. Some good references on GLMs are [1], [2], [3]. There s a consderable amount of research that deals wth specal cases of our problem. For example, when m c = 0 and n c = 0, we get a model whch s very smlar to a dgraph model commonly used n dagnostcs, e.g., see [9], [10]. The formulaton n ths paper s an extenson of the earler work of the authors [4], [11], [12]. In [4] we consdered a specal case of the current problem, n the context of fault dentfcaton. The paper [11] consders a dynamc system wth contnuous and dscrete states and contnuous outputs. The upcomng paper [12] consders a specal case of sparse parametrc nputs and dscrete outputs, where the goal s to compute the sparsty pattern of the nputs. II. ESTIMATION In ths secton we consder the problem of estmatng the most probable values of (x, d), gven measurements (y, z). We frst derve the log posteror probablty of (x, d) gven (y, z) and show that t s jontly concave n x and d (wth d relaxed to take values n [0, 1]). Thus the problem of estmatng the maxmum a posteror (MAP) estmate of x and d s a mxed nteger convex problem,.e., a convex optmzaton problem wth the addtonal constrant that some of the varables take values n {0, 1}. We then present an effcent heurstc for solvng ths problem approxmately, based on a convex relaxaton of the combnatoral MAP problem. Our analyss follows closely the prevous work of the authors n fault detecton [4]. The method that we present s readly extended to the case when we have multple measurements (y (1), z (1) ),..., (y (M), z (M) ) for the same nput (x, d): We just have to stack all these measurements and augment the system model equatons approprately. A. Maxmum a posteror estmaton a) Log posteror: From Bayes rule the posteror probablty (densty) of x and d gven y and z s m c m d p(x, d y, z) p(y x, d) p(z x, d)p(d)p(x). =1 Takng logarthms, we have =1 log p(x, d y, z) = l(x, d) + C, where C s a constant and l(x, d) = l mc (x, d) + l md (x, d) + l pd (d) + l pc (x), (3) wth the terms descrbed below. The posteror contrbuton due to the contnuous measurements s m c 1 l mc (x, d) = (y A cc x A dc d b c ) 2. 2σ 2 =1 The posteror contrbuton due to the dscrete measurements s m d l md (x, d) = z log Φ(A cd x + A dd d + b d ) =1 m d + (1 z ) log Φ( A cd x A dd d b d ), (4) =1 where Φ s the cumulatve dstrbuton of a standard Gaussan. The dscrete varable pror term s l pd (d) = λ T d, where λ j = log(p j /(1 p j )). The contnuous varable pror term s l pc (x) = log p(x). The log posteror (3) s jontly concave n x and d, when d s relaxed to take values n [0, 1] n d. Indeed, each of the terms s concave n x and d: l mc s a concave quadratc n x and d, l pd s a lnear functon of d, and log p(x) s concave by assumpton. Concavty of l md follows from log-concavty of Φ; see, e.g., [13, 3.5.2]. Fgure 1 shows a plot of log Φ(x) versus x for x rangng between 5 and 5. 2

Sfrag replacements x log Φ(x) y 2 0 2 4 6 8 10 12 14 16 5 4 3 2 1 0 1 2 3 4 5 x Fg. 1: Plot of log Φ(x) versus x. b) MAP estmaton: The problem of estmatng the most probable nput varables x and d, gven the outputs y and z, can be cast as the followng optmzaton problem: maxmze l(x, d) subject to d {0, 1} n d (5), wth varables x R n c and d {0, 1} n d. Ths s a mxed nteger convex problem. One straghtforward method for solvng t s to enumerate all 2 n d possble values for d, and to fnd the optmum x n each case, whch s tractable, snce for fxed d the problem s convex n x [13]. Ths approach s not practcal for n d larger than around 15 or so, or smaller, f the other dmensons are large. A branch and bound method, or other global optmzaton technque, can be used to (possbly) speed up computaton of the globally optmal soluton [14], [15]. But the worst case complexty of any method that computes the global soluton s exponental n n d. For ths reason we need to consder heurstcs for solvng problem (5) approxmately. B. Relaxed MAP estmaton In ths secton we descrbe a heurstc for approxmately solvng the MAP problem (5). Our heurstc s based on replacng the hard constrants d j {0, 1} wth soft constrants d j [0, 1]. Ths results n a convex optmzaton problem that we can solve effcently. We follow ths by roundng and (possbly) a smple local optmzaton method to mprove our estmate. c) Lnear relaxaton: We relax problem (5) to the followng optmzaton problem maxmze l(x, d) subject to 0 d 1, (6) wth varables x R nc and d R n d. Ths s a convex optmzaton problem, snce t nvolves maxmzng a concave functon over a convex set. We can effcently solve ths problem n many ways, e.g., va nteror-pont methods [16], [13]. The complexty of such methods can be shown to be cubc n n c + n d (assumng a fxed number of teratons of an nteror-pont method). Snce the feasble set for the relaxed MAP problem (6) contans the feasble set for the MAP problem (5), the optmal value of the relaxed MAP problem, whch we denote l ub, gves an upper bound on the optmal value of the MAP problem. Let (x, d ) be an optmal pont for the relaxed MAP problem (6), so we have l ub = l(x, d ) l(x, d) for any Boolean d. If d s also Boolean,.e., d j {0, 1} for all j, we conclude that d s n fact optmal for the MAP problem. In other words: When a soluton to the relaxed MAP problem turns out to have Boolean entres, t s optmal for the MAP problem. In general, of course, ths does not happen; at least some values of d j wll le between 0 and 1. d) Roundng: Let (x, d ) denote the optmal pont of the problem (6). We refer to d as a soft decson, snce ts components can be strctly between 0 and 1. The next step s to round the soft decson d to obtan a vald Boolean soluton (or hard decson) for d. Let θ (0, 1) and set ˆd = pos(d θ). To create ˆd, we smply round all entres of d j smaller than the threshold θ to zero. Thus θ s a threshold for guessng that a dscrete nput varable s 1, based on the relaxed MAP soluton d. As θ vares from 0 to 1, ths method generates up to n dfferent estmates ˆd, as each entry n d falls below the threshold. We can effcently fnd them all by sortng the entres of d, and settng the values of ˆd j to one n the order of ncreasng d j. We evaluate the log-posteror for each of these (or a subset) by solvng the optmzaton problem maxmze l(x, ˆd), (7) wth varables x R n c. Ths s an unconstraned convex problem that can also be solved effcently, agan wth cubc complexty n n c. The RMAP contnuous varable estmate ˆx s obtaned as the mnmzer of (7) correspondng to the best obtaned estmate ˆd. e) Local optmzaton: Further mprovement n our estmate can sometmes be obtaned by a local optmzaton method. We descrbe here the smplest possble such method. We ntalze ˆd as the one whch results n the largest value of l(x, d) after roundng. We then 3

cycle through j = 1,..., n, at step j replacng ˆd j wth 1 ˆd j. If ths leads to an ncrease n the optmal value of problem (7), we accept the change and contnue. If (as usually s the case) flppng the jth bt results n a decrease n l, we go on to the next ndex. We contnue untl we have rejected changes n all entres n ˆd. (At ths pont we can be sure that ˆd s 1-OPT, whch means that no change n one entry wll mprove the loss functon.) Numercal experments show that ths local optmzaton method often has no effect, whch means that the rounded soluton s 1-OPT. In some cases, however, t can lead to a modest ncrease n l. f) Performance of RMAP: The performance of our estmate of (x, d) should be judged by (for example) the mean-square error n estmatng x, and the probablty of makng errors n estmatng d (whch could be further broken down nto false postve and false negatve error rates). Numercal examples show that RMAP has very smlar performance as MAP, but has the advantage of tractablty. Ths can be partally explaned as follows. When the estmaton problem s hard, for example, when the nose levels are hgh, no estmaton method (and n partcular, nether MAP nor RMAP) can do a good job at estmatng x and d. When the estmaton problem s easy, for example, when the nose levels are low, even smple estmaton methods (ncludng RMAP) can do a good job at estmatng x and d. So t s only problems n between hard and easy where we could possbly see a sgnfcant dfference n estmaton performance between MAP and RMAP. In ths regon, however, we observe from numercal expermens that MAP and RMAP acheve very smlar performance. III. IDENTIFICATION In II we addressed the problem of estmatng (x, d) gven measurements (y, z) when the system model s known. In ths secton we look at the dual problem of fttng a model of the form (1) (2) to gven data, assumng that the contnuous nose varances are known. We show that the log lkelhood of the model parameters gven the measurements s a concave functon, so we can solve the maxmum lkelhood (ML) problem effcently. We frst observe that snce all measurements are ndependent of each other, we can separately dentfy the model parameters that correspond to each measurement. The complexty of the resultng ML dentfcaton technque s thus lnear n the total number of measurements. Fnally, we present a smple technque, varously known as compressed sensng [5], [6], [7], the Lasso [17], [18], sparse sgnal recovery [19], and bass pursut [20] that can be used to dentfy parsmonous models that ft the data well. Ths nvolves usng the l 1 -norm of the parameter vector of a gven lnear model as a surrogate for the model sparsty. A. Contnuous parameter dentfcaton Suppose that we are gven samples of the form (x (j), d (j), y (j) ) for j = 1,..., K. Let a cc, adc denote the th row of A cc, A dc respectvely. Snce the contnuous nose terms v c are Gaussan, the ML estmate of (a cc, adc, bc ) gven the data, s the one that maxmzes K l c (a cc, a dc, b c ) = (y (j) a cct x (j) a dc T d (j) b c ) 2. j=1 We can evaluate the maxmum lkelhood estmates of these model parameters effcently (va least squares), f σ s known. In order to estmate a parsmonous model, as well as σ, we propose solvng the followng problem maxmze l c (a cc, adc, bc ) µc ( acc 1 + a dc 1), (8) whch s a convex optmzaton problem, for a fxed parameter µ c > 0. Havng solved problem (8), we can then estmate the nose varance σ 2 as the varance of the measurement resduals (y (j) a cc T x (j) a dc T d (j) b c ). B. Dscrete parameter dentfcaton Now suppose that we are gven samples of the form (x (j), d (j), z (j) ) for j = 1,..., K. Let a cd, add denote the th row of A cd, A dd respectvely. The log lkelhood of (a cc ) gven the data s l d (a cd, adc, bc, add, b d ) = K j=1 (z (j) +(1 z (j) log Φ(a cd ) log Φ( a cd T x (j) + a dd T d (j) + b d ) ) T x (j) a dd T d (j) b d ), whch s a concave functon of (a cd, add, b d ). We can thus effcently evaluate the ML estmates of the dscrete model parameters effcently, for example usng Newton s method. Ths s equvalent to solvng a probt regresson problem. In order to estmate a parsmonous model we propose solvng the followng problem maxmze l d (a cd, add, b d ) µd ( acd 1 + a dd 1 ), (9) whch s a convex optmzaton problem, for a fxed parameter µ d > 0. C. Computatonal complexty Problems (8) and (9) can be solved effcently n a varety of methods such as nteror-pont methods (n a way smlar to [21], [8]) and frst order methods, e.g., [22]. In any case the complexty of solvng ths problem s cubc n n c + n d and lnear n K. Thus the overall complexty of l 1 -regularzed ML system dentfcaton s O(K(m c + m d )(n c + n d ) 3 ). 4

D. Choce of regularzaton parameter The regularzaton parameter µ c and µ d control the tradeoff between data ft (as measured by l(a)) and model sparsty (as measured by the l 1 norm). In order to keep our modellng procedure as flexble as possble, we use a dfferent regularzaton parameter for each 10 contnuous and dscrete sensor. 1 We use cross valdaton to choose each of the regularzaton parameters µ c and µd. We dvde our avalable data nto a tranng set and a test set. For each contnuous sensor = 1,..., m c and for each value PSfrag of µ c we replacements solve problem (8) for the tranng set and measure the average square resdual of measurement on the test set. We µ 10 2 then choose as µ c 10 2 10 1 10 0 10 1 10 2 10 the value of µ that gves the smallest e 3 z mu resdual. We repeat ths process for the dscrete sensors, Fg. 2: Plot of e z as a functon of µ on the valdaton set for where nstead of square resdual we use the average error dscrete sensor 5. rate n the dscrete sensor measurement. Ths s the smplest possble way of fttng the regularzaton parameters. There are varous other methods, mnmum relatve root mean square error r y on the such as mnmzng the Akake nformaton crteron valdaton set, defned as (AIC), or mnmzng the Bayesan nformaton crteron (BIC). For a detaled descrpton of these methods see r y = y ŷ 2, y 2 [3, 7]. where ŷ s the output predcted by the estmated model, IV. NUMERICAL EXAMPLE gven the true nput. For each dscrete sensor we choose the value of µ that acheves the mnmum average error rate on e z on the predcted value of z for the valdaton set. As an example, fgure 2 shows the plot of e z versus µ for dscrete sensor 5. We then use ths model estmate to predct the nputs (x, d) for the outputs (y, z) for the test set, usng the relaxed MAP method descrbed n II. We compute the error rate n our estmate of d (whch we call e d ), as well as the average relatve root-mean-square (RMS) error between our estmate and the true value of x, defned as In ths secton we present the results of applyng our estmaton and dentfcaton methods to a small artfcally generated example. Specfcally, we consder a system wth n c = n d = 10 and m c = 18 and m d = 16. We draw the elements of all system matrces randomly such that each of the matrces A cc, A dc, A cd, or A dd are 10% sparse. We draw the nonzero entres of the A cc and A dc matrces from a N (0, 1) dstrbuton, and the nonzero entres of the A cd and A dd matrces from a N (0, 1) dstrbuton. The entres of b c and b d are drawn from a N (0, 1) and a N (0, 100) dstrbuton respectvely. We set σ = 0.01 for all. Each element of nput x s generated accordng to a N (0, 1) pror. The pror probablty of d j beng equal to 1 s 0.2 for all j. We generate three sets of 500 samples: a tranng set, whch we use to ft our estmated model, a valdaton set, whch we use to select the best value of the regularzaton parameter µ for each sensor, and a test set, on whch we judge the accuracy of our resultng model. We look at 30 canddate values of µ, logarthmcally spaced n the range (10 2, 10 3 ). For each value of µ, we use the method descrbed n III to ft a mxed lnear model to the gven tranng set. For each contnuous sensor, we choose the value of µ whch acheves the errz 10 0 r x = x ˆx 2 x 2. Our estmated model yelds an error rate e d of about 2.2 10 3 on ths data and a relatve RMS error n x of about 4.7 10 2. In contrast, usng the true model for estmaton, yelds an error rate e d of about 2.0 10 3 and a relatve RMS error n x of about 4.1 10 2. We thus see that our system dentfcaton method does a reasonable job at fttng such a model to the gven data. The modellng and estmaton performance of our estmated model s shown n tables I and II respectvely. We judge the modellng performance of the model, based on how well ths model can predct the outputs from the nputs. As we can see from table I, the model does a good job at ths task, gven that the value of r y and e z for the test set are of the same order of magntude 5

Tran Valdaton Test r y 0.0056 0.0058 0.0057 e z 0.0063 0.0080 0.0110 TABLE I: Modellng performance of estmated model. Tran Valdaton Test r x 0.0500 0.0470 0.0460 e d 0.0052 0.0048 0.0022 TABLE II: Estmaton performance of estmated model. as the same values for the tranng and valdaton sets. Furthermore, from table II, we see that the model also generalzes well for estmaton. The values of r x and e d for the test set are of the same order as the ones for the tranng and valdaton sets. V. CONCLUSIONS We have ntroduced a class of mxed lnear systems that ncludes the standard lnear model and the probt model as specal cases. We have presented a smple heurstc for estmatng the nput gven the output of such a system based on a convex relaxaton of a combnatoral MAP problem. We have also presented a smple heurstc that uses l 1 -regularzed ML to dentfy such a model gven a set of data ponts. We have shown that ths method performs well n practce through a numercal example. [11] A. Zymns, S. Boyd, and D. Gornevsky, Mxed state estmaton for a lnear Gaussan Markov model, n 47th IEEE Conference on Decson and Control, 2008, pp. 3219 3226. [12] A. Zymns, S. Boyd, and E. Candès, Compressed sensng wth quantzed measurements, 2009, n preparaton. [13] S. Boyd and L. Vandenberghe, Convex Optmzaton. Cambrdge Unversty Press, 2004. [14] E. Lawler and D. Wood, Branch-and-bound methods: A survey, Operatons Research, vol. 14, pp. 699 719, 1966. [15] R. Moore, Global optmzaton to prescrbed accuracy, Computers and Mathematcs wth Applcatons, vol. 21, no. 6 7, pp. 25 39, 1991. [16] J. Nocedal and S. Wrght, Numercal Optmzaton. Sprnger, 1999. [17] R. Tbshran, Regresson shrnkage and selecton va the Lasso, Journal of the Royal Statstcal Socety, vol. 58, no. 1, pp. 267 288, 1996. [18] B. Efron, T. Haste, I. Johnstone, and R. Tbshran, Least angle regresson, Annals of statstcs, pp. 407 451, 2004. [19] J. Tropp, Just relax: Convex programmng methods for dentfyng sparse sgnals n nose, IEEE Transactons on Informaton Theory, vol. 52, no. 3, pp. 1030 1051, 2006. [20] S. Chen, D. Donoho, and M. Saunders, Atomc decomposton by bass pursut, SIAM revew, pp. 129 159, 2001. [21] S.-J. Km, K. Koh, M. Lustg, S. Boyd, and D. Gornevsky, An nteror-pont method for large-scale l 1 -regularzed least squares, IEEE Journal of Selected Topcs n Sgnal Processng, vol. 1, no. 4, pp. 606 617, 2007. [22] E. Hale, W. Yn, and Y. Zhang, Fxed-pont contnuaton for l 1 -mnmzaton: Methodology and convergence, SIAM Journal on Optmzaton, vol. 19, pp. 1107 1130, 2008. REFERENCES [1] P. McCullagh and J. Nelder, Generalzed lnear models. Chapman & Hall, 1989. [2] T. Haste and R. Tbshran, Generalzed addtve models, 1990. [3] T. Haste, R. Tbshran, and J. Fredman, The elements of statstcal learnng: data mnng, nference, and predcton. Sprnger, 2001. [4] A. Zymns, S. Boyd, and D. Gornevsky, Relaxed maxmum a posteror fault dentfcaton, Sgnal Processng, vol. 89, no. 6, pp. 989 999, 2009. [5] D. Donoho, Compressed sensng, IEEE Transactons on Informaton Theory, vol. 52, no. 4, pp. 1289 1306, 2006. [6] E. Candès, J. Romberg, and T. Tao, Stable sgnal recovery from ncomplete and naccurate measurements, Communcatons on Pure and Appled Mathematcs, vol. 59, no. 8, pp. 1207 1223, 2005. [7] E. Candès, J. Romberg, and T. Tao, Robust uncertanty prncples: exact sgnal reconstructon from hghly ncomplete frequency nformaton, IEEE Transactons on Informaton Theory, vol. 52, no. 2, pp. 489 509, 2006. [8] K. Koh, S.-J. Km, and S. Boyd, An nteror pont method for large-scale l 1 -regularzed logstc regresson, Journal of Machne Learnng Research, vol. 8, pp. 1519 1555, July 2007. [9] I. Sacks, Dgraph matrx analyss, IEEE transactons on relablty, vol. 34, no. 5, pp. 437 446, 1985. [10] S. Deb, K. Pattpat, V. Raghavan, M. Shaker, and R. Shrestha, Mult-sgnal flow graphs: a novel approach for system testablty analyss and fault dagnoss, IEEE Aerospace and Electronc Systems Magazne, vol. 10, no. 5, pp. 14 25, 1995. 6