Orthogonal Matching Pursuit: Recursive Function Approximation with Applications to Wavelet. Y. C. Pati R. Rezaiifar and P. S.

/ To appear in Proc. of the 27 th Annual Asilomar Conference on Signals Systems and Computers, Nov. {3, 993 / Orthogonal Matching Pursuit: Recursive Function Approximation with Applications to Wavelet Decomposition Y. C. Pati R. Rezaiifar and P. S. Krishnaprasad Information Systems Laboratory Dept. of Electrical Engineering Institute for Systems Research Dept. of Electrical Engineering Stanford University, Stanford, CA 935 University of Maryland, College Par, MD 272 Abstract In this paper we describe a recursive algorithm to compute representations of functions with respect to nonorthogonal and possibly overcomplete dictionaries of elementary building blocs e.g. ane (wavelet) frames. We propose a modication to the Matching Pursuit algorithm of Mallat and Zhang (992) that maintains full bacward orthogonality of the residual (error) at every step and thereby leads to improved convergence. We refer to this modied algorithm as Orthogonal Matching Pursuit (OMP). It is shown that all additional computation required for the OMP algorithm may be performed recursively. Introduction and Bacground Given a collection of vectors D = fx i g in a Hilbert space H, let us dene V = Spanfx n g; and W = V? (in H): We shall refer to D as a dictionary, and will assume the vectors x n, are normalized (x n = ). In [3] Mallat and Zhang proposed an iterative algorithm that they termed Matching Pursuit (MP) to construct representations of the form P V n a n x n ; () where P V is the orthogonal projection operator onto V. Each iteration of the MP algorithm results in an intermediate representation of the form i= a i x ni + R f + R f; where f is the current approximation, and R f the current residual (error). Using initial values of R f, f =, and =, the MP algorithm is comprised of the following steps, (I) Compute the inner-products fhr f; x n ig n. (II) Find n + such that (III) Set, R f; x n+ sup j where <. jhr f; x j ij ; f + = f + R f; x n+ xn+ R + R f? R f; x n+ xn+ (IV) Increment, ( +), and repeat steps (I){ (IV) until some convergence criterion has been satised. The proof of convergence [3] of MP relies essentially on the fact that R + f; x n+ =. This orthogonality of the residual to the last vector selected leads to the following \energy conservation" equation. R f 2 = R + f 2 + R f; x n+ 2 : (2) It has been noted that the MP algorithm may be derived as a special case of a technique nown as Projection Pursuit (c.f. [2]) in the statistics literature. A shortcoming of the Matching Pursuit algorithm in its originally proposed form is that although asymptotic convergence is guaranteed, the resulting approximation after any nite number of iterations will in general be suboptimal in the following sense. Let N <,

be the number of MP iterations performed. Thus we have f N = N? = R f; x n+ xn+: Dene V N = Spanfx n ; : : :; x nn g. We shall refer to f N as an optimal N-term approximation if f N = P VN f, i.e. f N is the best approximation we can construct using the selected subset fx n ; : : :; x nn g of the dictionary D. (Note that this notion of optimality does not involve the problem of selecting an \optimal" N-element subset of the dictionary.) In this sense, f N is an optimal N-term approximation, if and only if R N f 2 V? N. As MP only guarantees that R N f? x nn, f N as generated by MP will in general be suboptimal. The diculty with such suboptimality is easily illustrated by a simple example in IR 2. Let x, and x 2 be two vectors in IR 2, and tae f 2 IR 2, as shown in Figure. Figure (b) is a plot of R f 2 (b) Normalized Error.8.6..2 2π/3 x 2 π/8 x 2 6 8 2 Figure : Matching pursuit example in IR 2 : Dictionary D = fx ; x 2 g and a vector f 2 IR 2 versus. Hence although asymptotic convergence is guaranteed, after any nite number of steps, the error may still be quite large. In this paper we propose a renement of the Matching Pursuit (MP) algorithm that we refer to as Orthogonal Matching Pursuit (OMP). For nonorthogonal dictionaries, OMP will in general converge faster than MP. For any nite size dictionary of N elements, OMP converges to the projection onto the span of the dictionary elements in no more than N steps. Furthermore after any nite number of iterations, OMP A simlar diculty with the Projection Pursuit algorithm was noted by Donoho et.al. [] who suggested that bactting may be used to improve the convergence of PPR. Although the technique is not fully described in [] it appears that it is in the same spirit as the technique we present here. f gives the optimal approximation with respect to the selected subset of the dictionary. This is achieved by ensuring full bacward orthogonality of the error i.e. at each iteration R f 2 V?. For the example in Figure, OMP ensures convergence in exactly two iterations. It is also shown that the additional computation required for OMP, taes a simple recursive form. We demonstrate the utility of OMP by example of applications to representing functions with respect to time-frequency localized ane wavelet dictionaries. We also compare the performance of OMP with that of MP on two numerical examples. 2 Orthogonal Matching Pursuit Assume we have the following th -order model for f 2 H, a n x n+r f; with hr f; x n i = ; n = ; : : :: (3) The superscript, in the coecients a n, show the dependence of these coecients on the model-order. We would lie to update this th -order model to a model of order +, + a + n x n + R + f; with hr + f; x n i =, n = ; : : : +. () Since elements of the dictionary D are not required to be orthogonal, to perform such an update, we also require an auxiliary model for the dependence of x + on the previous x n 's (n = ; : : :). Let, x + = b nx n + ; with h ; x n i = ; n = ; : : :: P (5) Thus b nx n = P V x +, and = P V? x +, is the component of x + which is unexplained by fx ; : : :; x g. Using the auxiliary model (5), it may be shown that the correct update from the th -order model to the model of order +, is given by a + n = a n? b n; n = ; : : :; (6) and a + + = ; where = hr f; x + i h ; x + i = = hr f; x + i 2 hr f; x + i x + 2? P b n hx n ; x + i : 2

It also follows that the residual R + f satises, R R + f + ; and R f 2 = R + f 2 + jhr f; x + ij 2 2. The OMP Algorithm 2 : (7) The results of the previous section may be used to construct the following algorithm that we will refer to as Orthogonal Matching Pursuit (OMP). Initialization: f = ; R f; D = f g x = ; a = ; = (I) Compute fhr f; x n i ; x n 2 D n D g. (II) Find x n+ 2 D n D such that R f; x n+ sup j jhr f; x j ij ; < : (III) If R f; x n+ <, ( > ) then stop. (IV) Reorder the dictionary D, by applying the permutation + $ n +. (V) Compute b n, such P that, x + = b nx n + and h ; x n i = ; n = ; : : :; : (VI) Set, a + + = =?2 hr f; x + i, (VII) Set a + n = a n? b n; n = ; : : :; ; and update the model, f + = + a + n x n R + f? f + D + = D [ fx+ g: +, and repeat (I){(VII). Theorem 2. For f 2 H, let R f be the residuals generated by OMP. Then (i) lim! R f? P V? : (ii) f N = P VN f; N = ; ; 2; : : :. Proof: The proof of convergence parallels the proof of Theorem in [3]. The proof of the the second property follows immediately from the orthogonality conditions of Equation (3). Remars: The ey dierence between MP and OMP lies in Property (iii) of Theorem 2.. Property (iii) implies that at the N th step we have the best approximation we can get using the N vectors we have selected from the dictionary. Therefore in the case of nite dictionaries of size M, OMP converges in no more than M iterations to the projection of f onto the span of the dictionary elements. As mentioned earlier, Matching Pursuit does not possess this property. 2.3 Some Computational Details As in the case of MP, the inner products fhr f; x j ig may be computed recursively. For OMP we may express these recursions implicitly in the formula hr f; x j i = hf? f ; x j i = hf; x j i? a n hx n ; x j i : (8) The only additional computation required for OMP, arises in determining the b n's of the auxiliary model (5). To compute the b n's we rewrite the normal equations associated with (5) as a system of linear equations, v = A b ; (9) where v = [hx + ; x i ; hx + ; x 2 i : : : ; hx + ; x i] T b = b ; b 2 ; : : :; b T 2.2 Some Properties of OMP As in the case of MP, convergence of OMP relies on an energy conservation equation that now taes the form (7). The following theorem summarizes the convergence properties of OMP. and A = 2 6 hx ; x i hx 2 ; x i hx ; x i hx ; x 2 i hx 2 ; x 2 i hx ; x 2 i...... hx ; x i hx 2 ; x i hx ; x i 3 7 5 : 3

Note that the positive constant used in Step (III) of OMP guarantees nonsingularity of the matrix A, hence we may write b = A? v : () However, since A + may be written as A + = A v v ; () (where denotes conjugate transpose) it may be shown using the bloc matrix inversion formula that A A? =? + b b?b +?b ; (2) where = =(? v b ). Hence A? +, and therefore b +, may be computed recursively using A?, and b from the previous step. 3 Examples In the following examples we consider representations with repect to an ane wavelet frame constructed from dilates and translates of the second derivate of a Gaussian, i.e. D = f m;n ; m; n 2 Zg where, m;n(x) = 2 m=2 (2 m x? n); and the analyzing wavelet is given by, (x) = 3 p =2? x 2? e?x2 =2 : Note that for wavelet dictionaries, the initial set of inner products fhf; m;n ig, are readily computed by one convolution followed by sampling at each dilation level m. The dictionary used in these examples consists of a total of 35 vectors. In our rst example, both OMP and MP were applied to the signal shown in Figure 2. We see from Figure 2(b) that OMP clearly converges in far fewer iterations than MP. The squared magnitude of the coecients a, of the resulting representation is shown in Figure 3. We could also compare the two algorithms on the basis of required computational eort to compute representations of signals to within a prespecied error. However such a comparison can only be made for a given signal and dictionary, as the number of iterations required for each algorithm depends on both the signal and the dictionary. For example, for the signal of Example I, we see from Figure that it is 3.5 -.5 - Original Signal and OMP Approximation -.5-25 -2-5 - -5 5 5 2 25 Normalized L2 Error - -2 -- MP OMP -3 2 3 5 6 7 8 9 Figure 2: Example I : Original signal f, with OMP approximation superimposed, (b) Squared L 2 norm of residual R f versus iteration number, for both OMP (solid line) and MP (dashed line). Dilation Index -2 - -6-8 -6 - -2 2 6 8 Translation Index Figure 3: Distribution of coecients obtained by applying OMP in Example I. Shading is proportional to squared magnitude of the coecients a, with dar colors indicating large magnitudes. to 8 times more expensive to achieve a prespecied error using OMP even though OMP converges in fewer iterations. On the other hand for the signal shown in Figure 5, which lies in the span of three dictionary vectors, it is approximately 2 times more expensive to apply MP. In this case OMP converges in exactly three iterations. Summary and Conclusions In this paper we have described a recursive algorithm, which we refer to as Orthogonal Matching Pursuit (OMP), to compute representations of signals with respect to arbitrary dictionaries of elementary functions. The algorithm we have described is a modication of the Matching Pursuit (MP) algorithm of Mallat and Zhang [3] that improves convergence us-

Cost (FLOPS) 7 6 5 -- MP OMP 3-2.5-2 -.5 - -.5 Log of Normalized L2 Error Figure : Computational cost (FLOPS) versus approximation error for both OMP (solid line) and MP (dashed line) applied to the signal in Example I...2 at each step. Acnowledgements This research of Y.C.P. was supported in part by NASA Headquarters, Center for Aeronautics and Space Information Sciences (CASIS) under Grant NAGW9,S6, and in part by the Advanced Research Projects Agency of the Department of Defense monitored by the Air Force Oce of Scientic Research under Contract F962-93--85. This research of R.R. and P.S.K was supported in part by the Air Force Oce of Scientic Research under contract F962-92-J-5, the AFOSR University Research Initiative Program under Grant AFOSR-9-5, by the Army Research Oce under Smart Structures URI Contract no. DAAL3-92-G-2, and by the National Science Foundation's Engineering Research Centers Program, NSFD CDR 8832. (b) -.2-25 -2-5 - -5 5 5 2 25 Normalized L2 Error -5 - --- MP OMP -5 2 3 5 6 7 8 Figure 5: Example II: Original signal f, (b) Squared L 2 norm of residual R f versus iteration number, for both OMP (solid line) and MP (dashed line). References [] D. Donoho, I. Johnstone, P. Rousseeuw, and W. Stahel. The Annals of Statistics, 3(2):96{ 5, 985. Discussion following article by P. Huber. [2] P. J. Huber. Projection pursuit. The Annals of Statistics, 3(2):35{75, 985. [3] S. Mallat and Z. Zhang. Matching pursuits with time-frequency dictionaries. Preprint. Submitted to IEEE Transactions on Signal Processing, 992. ing an additional orthogonalization step. The main benet of OMP over MP is the fact that it is guaranteed to converge in a nite number of steps for a nite dictionary. We also demonstrated that all additional computation that is required for OMP may be performed recursively. The two algorithms, MP and OMP, were compared on two simple examples of decomposition with respect to a wavelet dictionary. It was noted that although OMP converges in fewer iterations than MP, the computational eort required for each algorithm depends on both the class of signals and choice of dictionary. Although we do not provide a rigorous argument here, it seems reasonable to conjecture that OMP will be computationally cheaper than MP for very redundant dictionaries, as nowledge of the redundancy is exploited in OMP to reduce the error as much as possible 5