Orthogonal Matching Pursuit: Recursive Function Approximation with Applications to Wavelet. Y. C. Pati R. Rezaiifar and P. S.

Similar documents
Orthogonal Matching Pursuit: Recursive Function Approximat ion with Applications to Wavelet Decomposition

Y. C. PATI Information Systems Laboratory Dept. of Electrical Engineering Stanford University, Stanford, CA 94305

The Viterbi Algorithm for Subset Selection

Efficient Implementation of the K-SVD Algorithm and the Batch-OMP Method

Efficient Implementation of the K-SVD Algorithm using Batch Orthogonal Matching Pursuit

However, m pq is just an approximation of M pq. As it was pointed out by Lin [2], more precise approximation can be obtained by exact integration of t

X.-P. HANG ETAL, FROM THE WAVELET SERIES TO THE DISCRETE WAVELET TRANSFORM Abstract Discrete wavelet transform (DWT) is computed by subband lters bank

3.1. Solution for white Gaussian noise

Adaptive Estimation of Distributions using Exponential Sub-Families Alan Gous Stanford University December 1996 Abstract: An algorithm is presented wh

Edge intersection graphs. of systems of grid paths. with bounded number of bends

Department of Electronics and Communication KMP College of Engineering, Perumbavoor, Kerala, India 1 2

International Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERA

Extra-High Speed Matrix Multiplication on the Cray-2. David H. Bailey. September 2, 1987

Problem Set 3. MATH 778C, Spring 2009, Austin Mohr (with John Boozer) April 15, 2009

Lecture 27: Fast Laplacian Solvers

Comments on the randomized Kaczmarz method

Iterative Closest Point Algorithm in the Presence of Anisotropic Noise

On the Relationships between Zero Forcing Numbers and Certain Graph Coverings

The Simplex Algorithm with a New. Primal and Dual Pivot Rule. Hsin-Der CHEN 3, Panos M. PARDALOS 3 and Michael A. SAUNDERS y. June 14, 1993.

arxiv: v1 [physics.comp-ph] 25 Dec 2010

Cardinal B-spline dictionaries on a compact interval

Tilings of the Euclidean plane

Sorting. There exist sorting algorithms which have shown to be more efficient in practice.

Chapter 3. Quadric hypersurfaces. 3.1 Quadric hypersurfaces Denition.

reasonable to store in a software implementation, it is likely to be a signicant burden in a low-cost hardware implementation. We describe in this pap

Iterative Algorithms I: Elementary Iterative Methods and the Conjugate Gradient Algorithms

1 INTRODUCTION The LMS adaptive algorithm is the most popular algorithm for adaptive ltering because of its simplicity and robustness. However, its ma

arxiv: v1 [math.co] 25 Sep 2015

Sparse Reconstruction / Compressive Sensing

SPARSE COMPONENT ANALYSIS FOR BLIND SOURCE SEPARATION WITH LESS SENSORS THAN SOURCES. Yuanqing Li, Andrzej Cichocki and Shun-ichi Amari

Regularity Analysis of Non Uniform Data

WE consider the gate-sizing problem, that is, the problem

Introduction to Topics in Machine Learning

Sparse Solutions to Linear Inverse Problems. Yuzhe Jin

Bipartite Graph based Construction of Compressed Sensing Matrices

A second order algorithm for orthogonal projection onto curves and surfaces

The Encoding Complexity of Network Coding

Counting the number of spanning tree. Pied Piper Department of Computer Science and Engineering Shanghai Jiao Tong University

Institute for Advanced Computer Studies. Department of Computer Science. Direction of Arrival and The Rank-Revealing. E. C. Boman y. M. F.

A Novel Approach for Image Compression using Matching Pursuit Signal Approximation and Simulated Annealing

Center for Automation Research. University of Maryland. the characterization and analysis of spectral signatures

New Worst-Case Upper Bound for #2-SAT and #3-SAT with the Number of Clauses as the Parameter

SHIP WAKE DETECTION FOR SAR IMAGES WITH COMPLEX BACKGROUNDS BASED ON MORPHOLOGICAL DICTIONARY LEARNING

COMPUTABILITY THEORY AND RECURSIVELY ENUMERABLE SETS

Statement-Level Communication-Free. Partitioning Techniques for. National Central University. Chung-Li 32054, Taiwan

Implementation of QR Up- and Downdating on a. Massively Parallel Computer. Hans Bruun Nielsen z Mustafa Pnar z. July 8, 1996.

Telecommunication and Informatics University of North Carolina, Technical University of Gdansk Charlotte, NC 28223, USA

Chapter 18 out of 37 from Discrete Mathematics for Neophytes: Number Theory, Probability, Algorithms, and Other Stuff by J. M. Cargal.

Edge-disjoint Spanning Trees in Triangulated Graphs on Surfaces and application to node labeling 1

Secret Instantiation in Ad-Hoc Networks

Rules for Identifying the Initial Design Points for Use in the Quick Convergent Inflow Algorithm

COUNTING PERFECT MATCHINGS

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006

Noncrossing Trees and Noncrossing Graphs

FACE RECOGNITION USING INDEPENDENT COMPONENT

consisting of compact sets. A spline subdivision scheme generates from such

Intersection of sets *

Diffusion Wavelets for Natural Image Analysis

Structured System Theory

Hydraulic pump fault diagnosis with compressed signals based on stagewise orthogonal matching pursuit

Alternating Projections

Vidhya.N.S. Murthy Student I.D Project report for Multimedia Processing course (EE5359) under Dr. K.R. Rao

IMAGE DENOISING USING NL-MEANS VIA SMOOTH PATCH ORDERING

. Introduction Image moments and various types of moment-based invariants play very important role in object recognition and shape analysis [], [2], [

Minimization of the Truncation Error by Grid Adaptation

ARITHMETIC operations based on residue number systems

Open and Closed Sets

Fast Algorithms for Regularized Minimum Norm Solutions to Inverse Problems

has phone Phone Person Person degree Degree isa isa has addr has addr has phone has phone major Degree Phone Schema S1 Phone Schema S2

Image reconstruction based on back propagation learning in Compressed Sensing theory

ELEG Compressive Sensing and Sparse Signal Representations

MATH 423 Linear Algebra II Lecture 17: Reduced row echelon form (continued). Determinant of a matrix.

A simple algorithm for the inverse field of values problem

AD0A STANFORD UNIV CA DEPT OP COMPUTER SCIENCE F/6 12/1 VERIFICATION OF LINK-LEVEL PROTOCOLS,(U) JAN 81 D E KNUTH. N0R11476-C0330 UNCLASSIFIED

Basis Selection For Wavelet Regression

Matching Pursuit Filter Design

Discriminative sparse model and dictionary learning for object category recognition

Advanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs

LOW-DENSITY PARITY-CHECK (LDPC) codes [1] can

Hyperplane Ranking in. Simple Genetic Algorithms. D. Whitley, K. Mathias, and L. Pyeatt. Department of Computer Science. Colorado State University

Computation of the Constrained Infinite Time Linear Quadratic Optimal Control Problem

THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER

Newton and Quasi-Newton Methods

Recent Developments in Model-based Derivative-free Optimization

I R TECHNICAL RESEARCH REPORT. Evolutionary Policy Iteration for Solving Markov Decision Processes

Using Local Trajectory Optimizers To Speed Up Global. Christopher G. Atkeson. Department of Brain and Cognitive Sciences and

Minimum-Cost Spanning Tree. as a. Path-Finding Problem. Laboratory for Computer Science MIT. Cambridge MA July 8, 1994.

Parallel Evaluation of Hopfield Neural Networks

Math 5BI: Problem Set 2 The Chain Rule

Distributed Compressed Estimation Based on Compressive Sensing for Wireless Sensor Networks

UMIACS-TR March Direction-of-Arrival Estimation Using the. G. Adams. M. F. Griffin. G. W. Stewart y. abstract

Adaptive Image Compression Using Sparse Dictionaries

THE TRANSITIVE REDUCTION OF A DIRECTED GRAPH*

time using O( n log n ) processors on the EREW PRAM. Thus, our algorithm improves on the previous results, either in time complexity or in the model o

Directional Derivatives. Directional Derivatives. Directional Derivatives. Directional Derivatives. Directional Derivatives. Directional Derivatives

Chapter 7. Nearest Point Problems on Simplicial Cones 315 where M is a positive denite symmetric matrix of order n. Let F be a nonsingular matrix such

Modified Iterative Method for Recovery of Sparse Multiple Measurement Problems

A Course in Machine Learning

Weighted Geodetic Convex Sets in A Graph

11 The Regular Pentagon

Transcription:

/ To appear in Proc. of the 27 th Annual Asilomar Conference on Signals Systems and Computers, Nov. {3, 993 / Orthogonal Matching Pursuit: Recursive Function Approximation with Applications to Wavelet Decomposition Y. C. Pati R. Rezaiifar and P. S. Krishnaprasad Information Systems Laboratory Dept. of Electrical Engineering Institute for Systems Research Dept. of Electrical Engineering Stanford University, Stanford, CA 935 University of Maryland, College Par, MD 272 Abstract In this paper we describe a recursive algorithm to compute representations of functions with respect to nonorthogonal and possibly overcomplete dictionaries of elementary building blocs e.g. ane (wavelet) frames. We propose a modication to the Matching Pursuit algorithm of Mallat and Zhang (992) that maintains full bacward orthogonality of the residual (error) at every step and thereby leads to improved convergence. We refer to this modied algorithm as Orthogonal Matching Pursuit (OMP). It is shown that all additional computation required for the OMP algorithm may be performed recursively. Introduction and Bacground Given a collection of vectors D = fx i g in a Hilbert space H, let us dene V = Spanfx n g; and W = V? (in H): We shall refer to D as a dictionary, and will assume the vectors x n, are normalized (x n = ). In [3] Mallat and Zhang proposed an iterative algorithm that they termed Matching Pursuit (MP) to construct representations of the form P V n a n x n ; () where P V is the orthogonal projection operator onto V. Each iteration of the MP algorithm results in an intermediate representation of the form i= a i x ni + R f + R f; where f is the current approximation, and R f the current residual (error). Using initial values of R f, f =, and =, the MP algorithm is comprised of the following steps, (I) Compute the inner-products fhr f; x n ig n. (II) Find n + such that (III) Set, R f; x n+ sup j where <. jhr f; x j ij ; f + = f + R f; x n+ xn+ R + R f? R f; x n+ xn+ (IV) Increment, ( +), and repeat steps (I){ (IV) until some convergence criterion has been satised. The proof of convergence [3] of MP relies essentially on the fact that R + f; x n+ =. This orthogonality of the residual to the last vector selected leads to the following \energy conservation" equation. R f 2 = R + f 2 + R f; x n+ 2 : (2) It has been noted that the MP algorithm may be derived as a special case of a technique nown as Projection Pursuit (c.f. [2]) in the statistics literature. A shortcoming of the Matching Pursuit algorithm in its originally proposed form is that although asymptotic convergence is guaranteed, the resulting approximation after any nite number of iterations will in general be suboptimal in the following sense. Let N <,

be the number of MP iterations performed. Thus we have f N = N? = R f; x n+ xn+: Dene V N = Spanfx n ; : : :; x nn g. We shall refer to f N as an optimal N-term approximation if f N = P VN f, i.e. f N is the best approximation we can construct using the selected subset fx n ; : : :; x nn g of the dictionary D. (Note that this notion of optimality does not involve the problem of selecting an \optimal" N-element subset of the dictionary.) In this sense, f N is an optimal N-term approximation, if and only if R N f 2 V? N. As MP only guarantees that R N f? x nn, f N as generated by MP will in general be suboptimal. The diculty with such suboptimality is easily illustrated by a simple example in IR 2. Let x, and x 2 be two vectors in IR 2, and tae f 2 IR 2, as shown in Figure. Figure (b) is a plot of R f 2 (b) Normalized Error.8.6..2 2π/3 x 2 π/8 x 2 6 8 2 Figure : Matching pursuit example in IR 2 : Dictionary D = fx ; x 2 g and a vector f 2 IR 2 versus. Hence although asymptotic convergence is guaranteed, after any nite number of steps, the error may still be quite large. In this paper we propose a renement of the Matching Pursuit (MP) algorithm that we refer to as Orthogonal Matching Pursuit (OMP). For nonorthogonal dictionaries, OMP will in general converge faster than MP. For any nite size dictionary of N elements, OMP converges to the projection onto the span of the dictionary elements in no more than N steps. Furthermore after any nite number of iterations, OMP A simlar diculty with the Projection Pursuit algorithm was noted by Donoho et.al. [] who suggested that bactting may be used to improve the convergence of PPR. Although the technique is not fully described in [] it appears that it is in the same spirit as the technique we present here. f gives the optimal approximation with respect to the selected subset of the dictionary. This is achieved by ensuring full bacward orthogonality of the error i.e. at each iteration R f 2 V?. For the example in Figure, OMP ensures convergence in exactly two iterations. It is also shown that the additional computation required for OMP, taes a simple recursive form. We demonstrate the utility of OMP by example of applications to representing functions with respect to time-frequency localized ane wavelet dictionaries. We also compare the performance of OMP with that of MP on two numerical examples. 2 Orthogonal Matching Pursuit Assume we have the following th -order model for f 2 H, a n x n+r f; with hr f; x n i = ; n = ; : : :: (3) The superscript, in the coecients a n, show the dependence of these coecients on the model-order. We would lie to update this th -order model to a model of order +, + a + n x n + R + f; with hr + f; x n i =, n = ; : : : +. () Since elements of the dictionary D are not required to be orthogonal, to perform such an update, we also require an auxiliary model for the dependence of x + on the previous x n 's (n = ; : : :). Let, x + = b nx n + ; with h ; x n i = ; n = ; : : :: P (5) Thus b nx n = P V x +, and = P V? x +, is the component of x + which is unexplained by fx ; : : :; x g. Using the auxiliary model (5), it may be shown that the correct update from the th -order model to the model of order +, is given by a + n = a n? b n; n = ; : : :; (6) and a + + = ; where = hr f; x + i h ; x + i = = hr f; x + i 2 hr f; x + i x + 2? P b n hx n ; x + i : 2

It also follows that the residual R + f satises, R R + f + ; and R f 2 = R + f 2 + jhr f; x + ij 2 2. The OMP Algorithm 2 : (7) The results of the previous section may be used to construct the following algorithm that we will refer to as Orthogonal Matching Pursuit (OMP). Initialization: f = ; R f; D = f g x = ; a = ; = (I) Compute fhr f; x n i ; x n 2 D n D g. (II) Find x n+ 2 D n D such that R f; x n+ sup j jhr f; x j ij ; < : (III) If R f; x n+ <, ( > ) then stop. (IV) Reorder the dictionary D, by applying the permutation + $ n +. (V) Compute b n, such P that, x + = b nx n + and h ; x n i = ; n = ; : : :; : (VI) Set, a + + = =?2 hr f; x + i, (VII) Set a + n = a n? b n; n = ; : : :; ; and update the model, f + = + a + n x n R + f? f + D + = D [ fx+ g: +, and repeat (I){(VII). Theorem 2. For f 2 H, let R f be the residuals generated by OMP. Then (i) lim! R f? P V? : (ii) f N = P VN f; N = ; ; 2; : : :. Proof: The proof of convergence parallels the proof of Theorem in [3]. The proof of the the second property follows immediately from the orthogonality conditions of Equation (3). Remars: The ey dierence between MP and OMP lies in Property (iii) of Theorem 2.. Property (iii) implies that at the N th step we have the best approximation we can get using the N vectors we have selected from the dictionary. Therefore in the case of nite dictionaries of size M, OMP converges in no more than M iterations to the projection of f onto the span of the dictionary elements. As mentioned earlier, Matching Pursuit does not possess this property. 2.3 Some Computational Details As in the case of MP, the inner products fhr f; x j ig may be computed recursively. For OMP we may express these recursions implicitly in the formula hr f; x j i = hf? f ; x j i = hf; x j i? a n hx n ; x j i : (8) The only additional computation required for OMP, arises in determining the b n's of the auxiliary model (5). To compute the b n's we rewrite the normal equations associated with (5) as a system of linear equations, v = A b ; (9) where v = [hx + ; x i ; hx + ; x 2 i : : : ; hx + ; x i] T b = b ; b 2 ; : : :; b T 2.2 Some Properties of OMP As in the case of MP, convergence of OMP relies on an energy conservation equation that now taes the form (7). The following theorem summarizes the convergence properties of OMP. and A = 2 6 hx ; x i hx 2 ; x i hx ; x i hx ; x 2 i hx 2 ; x 2 i hx ; x 2 i...... hx ; x i hx 2 ; x i hx ; x i 3 7 5 : 3

Note that the positive constant used in Step (III) of OMP guarantees nonsingularity of the matrix A, hence we may write b = A? v : () However, since A + may be written as A + = A v v ; () (where denotes conjugate transpose) it may be shown using the bloc matrix inversion formula that A A? =? + b b?b +?b ; (2) where = =(? v b ). Hence A? +, and therefore b +, may be computed recursively using A?, and b from the previous step. 3 Examples In the following examples we consider representations with repect to an ane wavelet frame constructed from dilates and translates of the second derivate of a Gaussian, i.e. D = f m;n ; m; n 2 Zg where, m;n(x) = 2 m=2 (2 m x? n); and the analyzing wavelet is given by, (x) = 3 p =2? x 2? e?x2 =2 : Note that for wavelet dictionaries, the initial set of inner products fhf; m;n ig, are readily computed by one convolution followed by sampling at each dilation level m. The dictionary used in these examples consists of a total of 35 vectors. In our rst example, both OMP and MP were applied to the signal shown in Figure 2. We see from Figure 2(b) that OMP clearly converges in far fewer iterations than MP. The squared magnitude of the coecients a, of the resulting representation is shown in Figure 3. We could also compare the two algorithms on the basis of required computational eort to compute representations of signals to within a prespecied error. However such a comparison can only be made for a given signal and dictionary, as the number of iterations required for each algorithm depends on both the signal and the dictionary. For example, for the signal of Example I, we see from Figure that it is 3.5 -.5 - Original Signal and OMP Approximation -.5-25 -2-5 - -5 5 5 2 25 Normalized L2 Error - -2 -- MP OMP -3 2 3 5 6 7 8 9 Figure 2: Example I : Original signal f, with OMP approximation superimposed, (b) Squared L 2 norm of residual R f versus iteration number, for both OMP (solid line) and MP (dashed line). Dilation Index -2 - -6-8 -6 - -2 2 6 8 Translation Index Figure 3: Distribution of coecients obtained by applying OMP in Example I. Shading is proportional to squared magnitude of the coecients a, with dar colors indicating large magnitudes. to 8 times more expensive to achieve a prespecied error using OMP even though OMP converges in fewer iterations. On the other hand for the signal shown in Figure 5, which lies in the span of three dictionary vectors, it is approximately 2 times more expensive to apply MP. In this case OMP converges in exactly three iterations. Summary and Conclusions In this paper we have described a recursive algorithm, which we refer to as Orthogonal Matching Pursuit (OMP), to compute representations of signals with respect to arbitrary dictionaries of elementary functions. The algorithm we have described is a modication of the Matching Pursuit (MP) algorithm of Mallat and Zhang [3] that improves convergence us-

Cost (FLOPS) 7 6 5 -- MP OMP 3-2.5-2 -.5 - -.5 Log of Normalized L2 Error Figure : Computational cost (FLOPS) versus approximation error for both OMP (solid line) and MP (dashed line) applied to the signal in Example I...2 at each step. Acnowledgements This research of Y.C.P. was supported in part by NASA Headquarters, Center for Aeronautics and Space Information Sciences (CASIS) under Grant NAGW9,S6, and in part by the Advanced Research Projects Agency of the Department of Defense monitored by the Air Force Oce of Scientic Research under Contract F962-93--85. This research of R.R. and P.S.K was supported in part by the Air Force Oce of Scientic Research under contract F962-92-J-5, the AFOSR University Research Initiative Program under Grant AFOSR-9-5, by the Army Research Oce under Smart Structures URI Contract no. DAAL3-92-G-2, and by the National Science Foundation's Engineering Research Centers Program, NSFD CDR 8832. (b) -.2-25 -2-5 - -5 5 5 2 25 Normalized L2 Error -5 - --- MP OMP -5 2 3 5 6 7 8 Figure 5: Example II: Original signal f, (b) Squared L 2 norm of residual R f versus iteration number, for both OMP (solid line) and MP (dashed line). References [] D. Donoho, I. Johnstone, P. Rousseeuw, and W. Stahel. The Annals of Statistics, 3(2):96{ 5, 985. Discussion following article by P. Huber. [2] P. J. Huber. Projection pursuit. The Annals of Statistics, 3(2):35{75, 985. [3] S. Mallat and Z. Zhang. Matching pursuits with time-frequency dictionaries. Preprint. Submitted to IEEE Transactions on Signal Processing, 992. ing an additional orthogonalization step. The main benet of OMP over MP is the fact that it is guaranteed to converge in a nite number of steps for a nite dictionary. We also demonstrated that all additional computation that is required for OMP may be performed recursively. The two algorithms, MP and OMP, were compared on two simple examples of decomposition with respect to a wavelet dictionary. It was noted that although OMP converges in fewer iterations than MP, the computational eort required for each algorithm depends on both the class of signals and choice of dictionary. Although we do not provide a rigorous argument here, it seems reasonable to conjecture that OMP will be computationally cheaper than MP for very redundant dictionaries, as nowledge of the redundancy is exploited in OMP to reduce the error as much as possible 5