One-norm regularized inversion: learning from the Pareto curve

Similar documents
Deconvolution with curvelet-domain sparsity Vishal Kumar, EOS-UBC and Felix J. Herrmann, EOS-UBC

Incoherent noise suppression with curvelet-domain sparsity Vishal Kumar, EOS-UBC and Felix J. Herrmann, EOS-UBC

Seismic Data Interpolation With Symmetry

Randomized sampling strategies

Main Menu. Summary. sampled) f has a sparse representation transform domain S with. in certain. f S x, the relation becomes

SUMMARY. Pursuit De-Noise (BPDN) problem, Chen et al., 2001; van den Berg and Friedlander, 2008):

SUMMARY. In combination with compressive sensing, a successful reconstruction

Time-jittered ocean bottom seismic acquisition Haneet Wason and Felix J. Herrmann

SUMMARY. In this paper, we present a methodology to improve trace interpolation

Sparsity-promoting migration accelerated by message passing Xiang LI and Felix J.Herrmann EOS-UBC

SUMMARY. forward/inverse discrete curvelet transform matrices (defined by the fast discrete curvelet transform, FDCT, with wrapping

Seismic data interpolation using nonlinear shaping regularization a

Full-waveform inversion with randomized L1 recovery for the model updates

Seismic wavefield inversion with curvelet-domain sparsity promotion Felix J. Herrmann, EOS-UBC and Deli Wang, Jilin University

Modified Iterative Method for Recovery of Sparse Multiple Measurement Problems

Recent developments in curvelet-based seismic processing

Curvelet-domain matched filtering Felix J. Herrmann and Peyman P. Moghaddam EOS-UBC, and Deli Wang, Jilin University

An SVD-free Pareto curve approach to rank minimization

Sub-Nyquist sampling and sparsity: getting more information from fewer samples

P257 Transform-domain Sparsity Regularization in Reconstruction of Channelized Facies

Robust Seismic Image Amplitude Recovery Using Curvelets

Seismic reflector characterization by a multiscale detection-estimation method

Simply denoise: wavefield reconstruction via jittered undersampling

Sparsity Based Regularization

Sparse wavelet expansions for seismic tomography: Methods and algorithms

ELEG Compressive Sensing and Sparse Signal Representations

Mostafa Naghizadeh and Mauricio D. Sacchi

A new take on FWI: Wavefield Reconstruction Inversion

Curvelet-based non-linear adaptive subtraction with sparseness constraints. Felix J Herrmann, Peyman P Moghaddam (EOS-UBC)

Optimized Compressed Sensing for Curvelet-based. Seismic Data Reconstruction

A Nuclear Norm Minimization Algorithm with Application to Five Dimensional (5D) Seismic Data Recovery

COMPRESSIVE SENSING FOR FULL WAVEFIELD IMAGE RECOVERY IN STRUCTURAL MONITORING APPLICATIONS

Large-scale workflows for wave-equation based inversion in Julia

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation

Lecture 17 Sparse Convex Optimization

B023 Seismic Data Interpolation by Orthogonal Matching Pursuit

Randomized sampling and sparsity: getting more information from fewer samples

Compressive sensing in seismic exploration: an outlook on a new paradigm

G009 Scale and Direction-guided Interpolation of Aliased Seismic Data in the Curvelet Domain

Compressive least-squares migration with source estimation

Introduction. Wavelets, Curvelets [4], Surfacelets [5].

Image reconstruction based on back propagation learning in Compressed Sensing theory

arxiv: v1 [cs.na] 16 May 2013

Using. Adaptive. Fourth. Department of Graduate Tohoku University Sendai, Japan jp. the. is adopting. was proposed in. and

Fast "online" migration with Compressive Sensing Felix J. Herrmann

SEG Houston 2009 International Exposition and Annual Meeting

Signal Reconstruction from Sparse Representations: An Introdu. Sensing

Curvelet-domain multiple elimination with sparseness constraints. Felix J Herrmann (EOS-UBC) and Eric J. Verschuur (Delphi, TUD)

Compressed sensing based land simultaneous acquisition using encoded sweeps

Randomized sampling without repetition in timelapse seismic surveys

Seismic data interpolation beyond aliasing using regularized nonstationary autoregression a

Seismic wavefield reconstruction using reciprocity

Hierarchical scale curvelet interpolation of aliased seismic data Mostafa Naghizadeh and Mauricio Sacchi

Seismic data reconstruction with Generative Adversarial Networks

Hyperspectral Data Classification via Sparse Representation in Homotopy

Least squares Kirchhoff depth migration: important details

Interpolation using asymptote and apex shifted hyperbolic Radon transform

Structurally Random Matrices

ADAPTIVE LOW RANK AND SPARSE DECOMPOSITION OF VIDEO USING COMPRESSIVE SENSING

SUMMARY THEORY. L q Norm Reflectivity Inversion

SEG/New Orleans 2006 Annual Meeting

Collaborative Sparsity and Compressive MRI

Sparse Component Analysis (SCA) in Random-valued and Salt and Pepper Noise Removal

Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference

Robust Principal Component Analysis (RPCA)

SUMMARY. method to synthetic datasets is discussed in the present paper.

Main Menu. Summary. Introduction

Source Estimation for Wavefield Reconstruction Inversion

Full-waveform inversion using seislet regularization a

SUMMARY. denoise the original data at each iteration. This can be

Robustness of the scalar elastic imaging condition for converted waves

A fast iterative thresholding algorithm for wavelet-regularized deconvolution

SUMMARY. min. m f (m) s.t. m C 1

Applications of the generalized norm solver

Face Recognition via Sparse Representation

Multicomponent f-x seismic random noise attenuation via vector autoregressive operators

Phase transitions in exploration seismology: statistical mechanics meets information theory

The fastclime Package for Linear Programming and Large-Scale Precision Matrix Estimation in R

Extending the search space of time-domain adjoint-state FWI with randomized implicit time shifts

Estimation of Crosstalk among Multiple Stripline Traces Crossing a Split by Compressed Sensing

Compressive Sensing for Multimedia. Communications in Wireless Sensor Networks

The Benefit of Tree Sparsity in Accelerated MRI

Reweighted thresholding and orthogonal projections for simultaneous source separation Satyakee Sen*, Zhaojun Liu, James Sheng, and Bin Wang, TGS

COMPRESSIVE VIDEO SAMPLING

Recovery of Seismic Wavefields Based on Compressive Sensing by

Supplementary Material : Partial Sum Minimization of Singular Values in RPCA for Low-Level Vision

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

Combinatorial Selection and Least Absolute Shrinkage via The CLASH Operator

3D Image Restoration with the Curvelet Transform

SPARSITY ADAPTIVE MATCHING PURSUIT ALGORITHM FOR PRACTICAL COMPRESSED SENSING

SUMMARY METHOD. d(t 2 = τ 2 + x2

Learning Splines for Sparse Tomographic Reconstruction. Elham Sakhaee and Alireza Entezari University of Florida

Hydraulic pump fault diagnosis with compressed signals based on stagewise orthogonal matching pursuit

Advanced phase retrieval: maximum likelihood technique with sparse regularization of phase and amplitude

An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation

Compressed Sensing Algorithm for Real-Time Doppler Ultrasound Image Reconstruction

A new sparse representation of seismic data using adaptive easy-path wavelet transform

DENOISING OF COMPUTER TOMOGRAPHY IMAGES USING CURVELET TRANSFORM

Journal of Applied Geophysics

Transcription:

One-norm regularized inversion: learning from the Pareto curve Gilles Hennenfent and Felix J. Herrmann, Earth & Ocean Sciences Dept., the University of British Columbia ABSTRACT Geophysical inverse problems typically involve a trade off between data misfit and some prior. Pareto curves trace the optimal trade off between these two competing aims. These curves are commonly used in problems with two-norm priors where they are plotted on a log-log scale and are known as L-curves. For other priors, such as the sparsity-promoting one norm, Pareto curves remain relatively unexplored. First, we show how these curves provide an objective criterion to gauge how robust one-norm solvers are when they are limited by a maximum number of matrix-vector products that they can perform. Second, we use Pareto curves and their properties to define and compute one-norm compressibilities. We argue this notion is key to understand one-norm regularized inversion. Third, we illustrate the correlation between the one-norm compressibility and the performance of Fourier and curvelet reconstructions with sparsity promoting inversion. INTRODUCTION Many inverse problems in geophysics are ill posed (Parker, 1994) their solutions are not unique or are acutely sensitive to changes in the data. To solve this kind of problem stably, additional information must be introduced. This technique is called regularization (see, e.g., Phillips, 1962; Tikhonov, 1963). Specifically, when the solution of an ill-posed problem is known to be (almost) sparse, Oldenburg et al. (1983) and others have observed that a good approximation to the solution can be obtained by using one-norm regularization to promote sparsity. More recently, results in information theory have breathed new life into the idea of promoting sparsity to regularize ill-posed inverse problems. These results establish that, under certain conditions, the sparsest solution of a (severely) underdetermined linear system can be exactly recovered by seeking the minimum one-norm solution (Candès et al., 2006; Donoho, 2006; Rauhut, 2007). This has led to tremendous activity in the newly established field of compressed sensing and a resurgence of onenorm regularized inversion. In this publication, we demonstrate how the Pareto curve can be used to make quantitative assessments regarding the performance of one-norm regularized, transform-

Hennenfent et Herrmann 2 One-norm Pareto curve based recovery from incomplete data. First, we outline the general problem formulation and associated convex optimization problems. Second, we give a brief overview of the Pareto curve and its properties. Third, we discuss two possible applications of this curve to gain further insights in one-norm regularized inverse problems and make quantitative assessments regarding their success. PROBLEM STATEMENT Consider the following underdetermined system of linear equations y = Ax 0 + n, (1) where the n-vectors y and n represent observations and additive noise, respectively. The n-by-n matrix A is the modeling operator that links the model x 0 to the noisefree data given by y n. We assume that N n and that x 0 has few nonzero or significant entries. We use the terms model and observations in a broad sense, so that many linear geophysical problems can be cast in the form shown in Equation 1. Because x 0 is assumed to be (almost) sparse, one can promote sparsity as a prior via one-norm regularization to overcome the singular nature of A when estimating x 0 from y. The challenge is to balance at best the data misfit, defined as y Ax 2, with the regularization term, x 1. One of the following three convex optimization approaches QP λ : BP σ : LS τ : 1 min y x 2 Ax 2 2 + λ x 1, min x 1 s.t. y Ax 2 σ, x 1 min y x 2 Ax 2 2 s.t. x 1 τ, can be used to this effect. QP λ closely relates to quadratic programming (QP) and is probably the most commonly used in geophysics. However, it is generally not clear how to choose the Lagrange multiplier λ 0 such that the solution of QP λ is, in some sense, optimal. The basis pursuit (BP) denoise problem (Chen et al., 1998) is often preferred when an estimate of the noise σ 0 in the data is available. The LASSO problem (Tibshirani, 1996) is of lesser interest because an estimate of the one norm of the solution τ 0 is typically not available for geophysical problems. To gain insights into one-norm regularized inversion, we propose to look at the Pareto curve. In the context of the two-norm i.e., Tikhonov regularization, the Pareto curve is commonly used where it is plotted on a log-log scale and is known as L-curve (Lawson and Hanson, 1974).

Hennenfent et Herrmann 3 One-norm Pareto curve PARETO CURVE The Pareto curve traces, for a specific pair of A and y, the optimal tradeoff in the space spanned by the data misfit and the one-norm regularization term. Figure 1 gives a schematic illustration. Point 1 clarifies the connection between the three parameters of QP λ, BP σ, and LS τ (see van den Berg and Friedlander, 2008; Hennenfent et al., 2008, for more details). The coordinates of a point on the Pareto curve are (τ, σ) and the slope of the tangent at this point is λ. The end points of the curve points 2 and 3 are two special cases. When τ = 0, the solution of LS τ is x = 0 (point 2 ). It coincides with the solution of BP σ with σ = y 2. When σ = 0, the solution of BP σ (point 3 ) coincides with the solutions of LS τ, where τ = τ BP0, and QP λ, where λ = 0 + i.e., λ infinitely close to zero from above. These relations are formalized as follows in van den Berg and Friedlander (2008): Result 1. The Pareto curve i) is convex and decreasing, ii) is continuously differentiable, and iii) has a negative slope λ. 2 (0, y 2 ) Pareto curve y Ax 2 σ 1 (τ, σ) slope: λ unreachable area τ x 1 3 (τ BP0, 0) Figure 1: Schematic illustration of a Pareto curve. Point 1 exposes the connection between the three parameters of QP λ, BP σ, and LS τ. Point 2 corresponds to the trivial solution, i.e., x = 0, and point 3 to a solution of BP σ with σ = 0. For large-scale geophysical applications, it is not practical (or even feasible) to sample the entire Pareto curve. However, its regularity, as implied by Result 1, means that it is possible to obtain a good approximation to the curve with very few interpolating points, as illustrated by Hennenfent et al. (2008) on a wavefield reconstruction problem using curvelets (Herrmann and Hennenfent, 2008).

Hennenfent et Herrmann 4 One-norm Pareto curve APPLICATIONS Comparison of one-norm solvers To understand the robustness of different one-norm solvers that are limited by a maximum number of matrix-vector products that they can perform, Hennenfent et al. (2008) propose to track on a graph the data misfit versus the one norm of successive iterates for each solver. The Pareto curve serves as a reference and gives a rigorous yardstick for measuring the quality of the solution path generated by each algorithm. Figure 2 shows the Pareto curve and the BP 0 solution paths of four one-norm solvers: iterative soft thresholding (IST) introduced by Daubechies et al. (2004), IST extension to include cooling (ISTc - Figueiredo and Nowak, 2003; Herrmann and Hennenfent, 2008), the spectral projected-gradient (SPGl 1 ) algorithm introduced by van den Berg and Friedlander (2008), and iterative re-weighted least-squares (IRLS - Gersztenkorn et al., 1986), which uses a quadratic approximation to the one-norm regularization function. The maximum number of iterations is small compared to the size of the problem and fixed. This roughly equates to using the same number of matrix-vector products for each solver. The starting vector provided to each solver is the zero vector, and hence the paths start at (0, y 2 ) point 2 in Figure 1. The problem considered is a benchmark problem that is typically used in the compressed sensing literature (Donoho et al., 2006). The matrix A is taken to have Gaussian independent and identically-distributed entries; a sparse solution x 0 is randomly generated, and the observations y are computed according to Equation 1. Whereas SPGl 1 provides a fairly accurate approximation to the BP 0 solution, those computed by IST, ISTc, and IRLS suffer from larger errors. IST solves QP 0 +. Because there is hardly any regularization, IST first works towards minimizing the data misfit. When the data misfit is sufficiently small, the effect of the one-norm penalization starts, yielding a change of direction towards the BP 0 solution. The limited number of matrix-vector products terminates the procedure early. The data misfit at the candidate solution is small but the one norm is completely incorrect. ISTc solves QP λ for a decreasing sequence λ i 0. The starting vector for QP λi is the solution of QP λi 1, which is by definition on the Pareto curve when each subproblem is accurately solved. This explains why ISTc so closely follows the curve, at least at the beginning of the path. Towards the end, ISTc accumulates small errors because there are not enough iterations to solve each subproblem to sufficient accuracy. IRLS suffers from similar difficulties. Compression of seismic energy Another insightful application of the Pareto curve is when the matrix A is defined as the synthesis operator of a (redundant) transform, e.g., Fourier or curvelet (Candès et al., 2005, and references therein) transform. In this case, the Pareto curve measures

Hennenfent et Herrmann 5 One-norm Pareto curve Figure 2: Pareto curve and optimization paths (same, limited number of iterations) of four solvers for a BP 0 problem. The symbols + represent a sampling of the Pareto curve. The solid ( ) line is the solution path of ISTc, the chain ( ) line the path of SPGLl 1, the dashed ( ) line the path of IST, and the dotted ( ) line the path of IRLS.

Hennenfent et Herrmann 6 One-norm Pareto curve the compressibility, in the one-norm sense, of the data y in a given transform domain. This can be used to compare different transforms. In Figure 3, we compare the Fourier transform with the curvelet transform. The synthesis operators have unit-norm columns. In the first experiment, y is defined as the real migrated image depicted in Figure 3(a). Figure 3(b) shows the corresponding Pareto curves approximated with seven interpolating points. In the second experiment, y is defined as the synthetic shot gather depicted in Figure 3(c) and Figure 3(d) displays the corresponding Pareto curves also approximated with seven interpolating points. In both experiments, the curvelet transform compresses better the seismic energy. The improvement on the Fourier transform is particularly sizable in the case of the shot gather. Although these observations do not, by themselves, explain the success of sparsity-promoting formulations using the curvelet transform, they provide a better justification than the arguments proposed by Candès et al. (2005) and Hennenfent and Herrmann (2006). These arguments, rooted in the empirical decay rate for the magnitude-sorted transform coefficients, link a higher rate to a better performance of one-norm recovery. Whereas this reasoning is viable for non-redundant transforms, it looses some of its strength for redundant signal expansions, where more coefficients are needed to approximate the signal. This confronts us with a dilemma because results using the overcomplete curvelet transform generally show an improvement over Fourier-based techniques, an observation that would be consistent with increased sparsity. In this case, an argumentation can be made either by considering decay rates as a function of the percentage of the total coefficients (Hennenfent and Herrmann, 2006) or by working with a reduced transform-domain coefficient vector to compensate for the redundancy of the transform.(candès et al., 2005) Unfortunately, these explanation artifices are rather unsatisfactory. Therefore we empirically study whether there exists a link between the one-norm compressibility and the performance of sparsity-promoting algorithms. The test problem is the reconstruction of an incomplete, noise-free wavefield. The methods considered are the Fourier reconstruction with sparsity-promoting inversion (FRSI - Zwartjes, 2005) and its curvelet counterpart (CRSI - Herrmann and Hennenfent, 2008). Both methods are implemented using SPGl 1. The simulated input data (Figure 3(c)) is the shot gather in Figure 3(c) with randomly-missing traces. Figures 4(b) and 4(c) show the FRSI and CRSI results, respectively. The corresponding signal-to-noise ratios (SNR) are 9.25 db and 13.55 db. Figure 4(d) displays the SPGl 1 paths to the BP 0 solutions. Note how these paths behave similarly to the Pareto curves in Figure 3(d). Furthermore, the offset between the BP 0 solutions remains remarkably constant. The restriction operator used in FRSI and CRSI did not fundamentally change the structure of the problems.

Hennenfent et Herrmann 7 One-norm Pareto curve (a) (b) (c) (d) Figure 3: One-norm compressibility in the Fourier and curvelet domains. (a) Real migrated image and (b) corresponding Pareto curves for both transforms. (c) Synthetic shot gather and (d) corresponding Pareto curves for both transforms. The symbols F trace Fourier curves, and C curvelet curves.

Hennenfent et Herrmann 8 One-norm Pareto curve (a) (b) (c) (d) Figure 4: Noise-free wavefield reconstruction using the Fourier or curvelet transform. (a) Simulated acquired data, (b) FRSI result (SNR = 9.25 db), (c) CRSI result (SNR = 13.55 db), and (d) SPGl 1 paths to the BP 0 solutions of FRSI (dash line) and CRSI (solid line).

Hennenfent et Herrmann 9 One-norm Pareto curve CONCLUSIONS The current resurgence of successful one-norm regularized inverse problems motivates the need of a better understanding of the inner workings of these problems. The sheer size of geophysical applications makes this task difficult though. We show that the Pareto curve, thanks to its properties, is a practical and insightful tool. This curve serves, for example, as a reference in order to make an informed decision on how to best truncate the solution process of one-norm solvers and reduce the amount of computation. The Pareto curve and its properties are also useful in defining and computing the one-norm compressibility of a signal in a transform domain, as we illustrate using Fourier and curvelets. Furthermore, we observe that our extended notion of compressibility can be used to make quantitative assessments regarding the performance of one-norm regularized, transform-based recovery from incomplete data. This prospect is particularly exciting and can possibly be leveraged to other one-norm regularized inversions. ACKNOWLEDGMENTS The authors thank E. van den Berg, Ö. Yilmaz, and M. P. Friedlander for fruitful discussions. This publication was prepared using Madagascar (rsf.sf.net), a package for reproducible computational experiments, SPGl 1 (cs.ubc.ca/labs/scl/ spgl1), a one-norm solver, and Sparco (cs.ubc.ca/labs/scl/sparco), a suite of linear operators and problems for testing algorithms for sparse signal reconstruction. This research was in part financially supported by NSERC Discovery Grant 22R81254 of F.J.H. and by CRD Grant DNOISE 334810-05 of F.J.H., and was carried out as part of the SINBAD project with support, secured through ITF, from the following organizations: BG Group, BP, Chevron, ExxonMobil, and Shell. REFERENCES van den Berg, E. and M. P. Friedlander, 2008, Probing the Pareto frontier for basis pursuit solutions: Technical Report TR-2008-01, UBC Computer Science Department. (http://www.optimization-online.org/db_html/2008/01/1889.html). Candès, E. J., L. Demanet, D. L. Donoho, and L. Ying, 2005, Fast discrete curvelet transforms: Multiscale Modeling and Simulation, 5, 861 899. Candès, E. J., J. Romberg, and T. Tao, 2006, Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information: IEEE Transactions on Information Theory, 52, no. 2, 489 509. Chen, S. S., D. L. Donoho, and M. A. Saunders, 1998, Atomic decomposition by basis pursuit: SIAM Journal on Scientific Computing, 20, no. 1, 33 61. Daubechies, I., M. Defrise, and C. De Mol, 2004, An iterative thresholding algorithm for linear inverse problems with a sparsity constraint: Communications on Pure and Applied Mathematics, LVII, 1413 1457.

Hennenfent et Herrmann 10 One-norm Pareto curve Donoho, D. L., 2006, Compressed sensing: IEEE Transactions on Information Theory, 52, no. 4, 1289 1306. Donoho, D. L., Y. Tsaig, I. Drori, and J.-L. Starck, 2006, Sparse solution of underdetermined linear equations by stagewise orthogonal matching pursuit: Technical Report TR-2006-2, Stanford Statistics Department. (http://stat.stanford.edu/ ~idrori/stomp.pdf). Figueiredo, M. and R. Nowak, 2003, An EM algorithm for wavelet-based image restoration: IEEE Transactions on Image Processing, 12, no. 8, 906 916. Gersztenkorn, A., J. B. Bednar, and L. Lines, 1986, Robust iterative inversion for the one-dimensional acoustic wave equation: Geophysics, 51, no. 2, 357 369. Hennenfent, G. and F. J. Herrmann, 2006, Seismic denoising with non-uniformly sampled curvelets: Computing in Science and Engineering, 8, no. 3, 16 25. Hennenfent, G., E. van den Berg, M. P. Friedlander, and F. J. Herrmann, 2008, New insights into one-norm solvers from the pareto curve: Geophysics. Herrmann, F. J. and G. Hennenfent, 2008, Non-parametric seismic data recovery with curvelet frames: Geophysical Journal International, 173, 233 248. Lawson, C. L. and R. J. Hanson, 1974, Solving least squares problems: Prentice Hall. Oldenburg, D., T. Scheuer, and S. Levy, 1983, Recovery of the acoustic impedance from reflection seismograms: Geophysics, 48, no. 10, 1318 1337. Parker, R. L., 1994, Geophysical inverse theory: Princeton University Press. Phillips, D. L., 1962, A technique for the numerical solution of certain integral equations of the first kind: Journal of the Association for Computing Machinery, 9, no. 1, 84 97. Rauhut, H., 2007, Random sampling of sparse trigonometric polynomials: Applied and Computational Harmonic Analysis, 22, no. 1, 16 42. Tikhonov, A. N., 1963, Solution of incorrectly formulated problems and regularization method: Soviet mathematics - Doklady, 4, 1035 1038. Tibshirani, R., 1996, Regression shrinkage and selection via the LASSO: Journal Royal Statististics, 58, no. 1, 267 288. Zwartjes, P. M., 2005, Fourier reconstruction with sparse inversion: PhD thesis, Delft University of Technology.