A projected Hessian matrix for full waveform inversion Yong Ma and Dave Hale, Center for Wave Phenomena, Colorado School of Mines

Similar documents
Model parametrization strategies for Newton-based acoustic full waveform

Accelerating the Hessian-free Gauss-Newton Full-waveform Inversion via Preconditioned Conjugate Gradient Method

Source Estimation for Wavefield Reconstruction Inversion

Downloaded 09/16/13 to Redistribution subject to SEG license or copyright; see Terms of Use at

Improvements in time domain FWI and its applications Kwangjin Yoon*, Sang Suh, James Cai and Bin Wang, TGS

Full-waveform inversion for reservoir characterization: A synthetic study

Inversion after depth imaging

SUMMARY. min. m f (m) s.t. m C 1

Full-waveform inversion using seislet regularization a

Full waveform inversion by deconvolution gradient method

We B3 12 Full Waveform Inversion for Reservoir Characterization - A Synthetic Study

Effects of multi-scale velocity heterogeneities on wave-equation migration Yong Ma and Paul Sava, Center for Wave Phenomena, Colorado School of Mines

Inexact Full Newton Method for Full Waveform Inversion using Simultaneous encoded sources

F031 Application of Time Domain and Single Frequency Waveform Inversion to Real Data

U043 3D Prestack Time Domain Full Waveform Inversion

SUMMARY. earth is characterized by strong (poro)elasticity.

SUMMARY INTRODUCTION THEORY

A new take on FWI: Wavefield Reconstruction Inversion

SUMMARY INTRODUCTION THEORY. Objective function of ERWI

Applying Gauss-Newton and Exact Newton method to Full Waveform Inversion

Stochastic conjugate gradient method for least-square seismic inversion problems Wei Huang*, Hua-Wei Zhou, University of Houston

Building accurate initial models using gain functions for. waveform inversion in the Laplace domain

Image-guided 3D interpolation of borehole data Dave Hale, Center for Wave Phenomena, Colorado School of Mines

Large-scale workflows for wave-equation based inversion in Julia

Full-waveform inversion with randomized L1 recovery for the model updates

Strategies for elastic full waveform inversion Espen Birger Raknes and Børge Arntsen, Norwegian University of Science and Technology

Extending the search space of time-domain adjoint-state FWI with randomized implicit time shifts

Full waveform inversion of physical model data Jian Cai*, Jie Zhang, University of Science and Technology of China (USTC)

Geophysical Journal International

Automatic wave equation migration velocity analysis by differential semblance optimization

WS10-A03 Alternative Multicomponent Observables for Robust Full-waveform Inversion - Application to a Near-surface Example

Downloaded 10/26/15 to Redistribution subject to SEG license or copyright; see Terms of Use at

Target-oriented wavefield tomography: A field data example

A hybrid stochastic-deterministic optimization method for waveform inversion

Classical Gradient Methods

Multi Layer Perceptron trained by Quasi Newton learning rule

Adaptive Waveform Inversion: Theory Mike Warner*, Imperial College London, and Lluís Guasch, Sub Salt Solutions Limited

G004 Robust Frequency-domain Multi-parameter Elastic Full Waveform Inversion

Lecture 2: Seismic Inversion and Imaging

Elastic full waveform Inversion for land walkaway VSP data

Robustness of the scalar elastic imaging condition for converted waves

Least-squares migration/inversion of blended data

Seismic Modeling, Migration and Velocity Inversion

Multi-source Least-squares Migration of Gulf of Mexico Data

Approximate Constant Density Acoustic Inverse Scattering Using Dip-Dependent Scaling Rami Nammour and William Symes, Rice University

Well-log derived step lengths in Full Waveform Inversion Tunde Arenrin Gary Margrave

We N Converted-phase Seismic Imaging - Amplitudebalancing Source-independent Imaging Conditions

A hybrid tomography method for crosswell seismic inversion

Conjugate guided gradient (CGG) method for robust inversion and its application to velocity-stack inversion a

Building starting model for full waveform inversion from wide-aperture data by stereotomography

Deconvolution with curvelet-domain sparsity Vishal Kumar, EOS-UBC and Felix J. Herrmann, EOS-UBC

We G High-resolution Tomography Using Offsetdependent Picking and Inversion Conditioned by Image-guided Interpolation

Downloaded 09/09/15 to Redistribution subject to SEG license or copyright; see Terms of Use at

We G Time and Frequency-domain FWI Implementations Based on Time Solver - Analysis of Computational Complexities

SUMMARY. method to synthetic datasets is discussed in the present paper.

SUMMARY. amounts to solving the projected differential equation in model space by a marching method.

SUMMARY ELASTIC SCALAR IMAGING CONDITION

Post-stack iterative modeling migration and inversion (IMMI)

Which parameterization for acoustic vertical transverse. isotropic full waveform inversion? - Part 2: synthetic and

WS 09 Velocity Model Building

Electromagnetic migration of marine CSEM data in areas with rough bathymetry Michael S. Zhdanov and Martin Čuma*, University of Utah

A least-squares shot-profile application of time-lapse inverse scattering theory

Seismic reflector characterization by a multiscale detection-estimation method

Experimental Data and Training

Constrained and Unconstrained Optimization

Data dependent parameterization and covariance calculation for inversion of focusing operators

Introduction to Optimization

Least-squares Wave-Equation Migration for Broadband Imaging

Headwave Stacking in Terms of Partial Derivative Wavefield

Writing Kirchhoff migration/modelling in a matrix form

Recovering the Reflectivity Matrix and Angledependent Plane-wave Reflection Coefficients from Imaging of Multiples

Amplitude and kinematic corrections of migrated images for non-unitary imaging operators

Target-oriented wavefield tomography using demigrated Born data

A low rank based seismic data interpolation via frequencypatches transform and low rank space projection

Target-oriented wave-equation inversion

High Resolution Imaging by Wave Equation Reflectivity Inversion

Machine-learning Based Automated Fault Detection in Seismic Traces

Least-squares imaging and deconvolution using the hybrid norm conjugate-direction solver

Refraction Full-waveform Inversion in a Shallow Water Environment

Anisotropic model building with well control Chaoguang Zhou*, Zijian Liu, N. D. Whitmore, and Samuel Brown, PGS

3D image-domain wavefield tomography using time-lag extended images

(x, y, z) m 2. (x, y, z) ...] T. m 2. m = [m 1. m 3. Φ = r T V 1 r + λ 1. m T Wm. m T L T Lm + λ 2. m T Hm + λ 3. t(x, y, z) = m 1

Contents. I Basics 1. Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

Recent Developments in Model-based Derivative-free Optimization

SEG/San Antonio 2007 Annual Meeting

SUMMARY. denoise the original data at each iteration. This can be

SUMMARY THEORY INTRODUCTION

Least squares Kirchhoff depth migration: important details

Frequency Domain Wave Equation Inversion and Its Application on the Heterogeneous Reservoir Model Data

Imaging and Full Waveform Inversion of seismic data from the CO 2 gas cloud at Sleipner

Ultrasonic Multi-Skip Tomography for Pipe Inspection

Numerical dispersion analysis for three-dimensional Laplace-Fourier-domain scalar wave equation

Temporal windowing and inverse transform of the wavefield in the Laplace-Fourier domain

Fast Gabor Imaging with Spatial Resampling

An Efficient 3D Elastic Full Waveform Inversion of Time-Lapse seismic data using Grid Injection Method

Comments on wavefield propagation using Reverse-time and Downward continuation

P052 3D Modeling of Acoustic Green's Function in Layered Media with Diffracting Edges

SUMMARY. Pursuit De-Noise (BPDN) problem, Chen et al., 2001; van den Berg and Friedlander, 2008):

Time-lapse seismic imaging using regularized full-waveform inversion with a prior model: which strategy?

Implicit and explicit preconditioning for least squares nonstationary phase shift

Transcription:

A projected Hessian matrix for full waveform inversion Yong Ma and Dave Hale, Center for Wave Phenomena, Colorado School of Mines SUMMARY A Hessian matrix in full waveform inversion (FWI) is difficult to compute directly because of high computational cost and an especially large memory requirement. Therefore, Newton-like methods are rarely feasible in realistic large-size FWI problems. We modify the quasi-newton BFGS method to use a projected Hessian matrix that reduces both the computational cost and memory required, thereby making a quasi-newton solution to FWI feasible. INTRODUCTION Full waveform inversion (FWI) (Tarantola, 1984) is usually formulated as an optimization problem, in which we minimize a nonlinear objective function E: R N R, min E (m), (1) m RN where m denotes N parameters, such as seismic velocities, for an earth model. In FWI, E often takes a least-squares form: E (m) = 1 2 d F(m) 2, where. denotes an L2 norm, d denotes the recorded data, and F is a forward data-synthesizing operator, a nonlinear function of the model m. We consider only iterative methods, such as Newton s method and quasi-newton methods, for solutions to FWI. In the i th iteration of such methods, we update the model m in the direction of a vector p i : m i+1 = m i + α i p i, (2) for some step length α i. Newton s and quasi-newton methods differ in the way in which they compute and use p i. In Newton s method, we ignore the higher-order (> 2) terms in the Taylor series of E (m): E (m i + δm i ) E (m i ) + δm T i g i + 1 2 δmt i H iδm i, (3) and minimize this approximated E (m) by solving H i δm i = g i, (4) where the gradient g i E, and the Hessian matrix H m i i 2 E. m 2 i Therefore, the update direction in Newton s method is simply and step length α i = 1. p i = H i g i, (5) Nevertheless, Newton s method is suitable only for solving small- or medium-size optimization problems (Pratt et al., 1998; Virieux and Operto, 2009), in which the number N of model parameters ranges from hundreds to thousands. For models with a large number N of parameters, high costs for computing the Hessian H i prevent the application of Newton-like methods. If, say, N = 500000 in 2D FWI, although we can use an efficient way in Pratt et al. (1998), we must solve the forward problem F(m) for 500000 times to compute the Hessian in every iteration of FWI. Furthermore, the memory required to store this Hessian matrix in single precision is 1 terabyte, or about 1/2 terabyte considering symmetry in the Hessian. Quasi-Newton methods do not compute the Hessian matrix explicitly, but instead iteratively update a Hessian approximation. The BFGS method (Broyden, 1970; Fletcher, 1970; Goldfarb, 1970; Shanno, 1970) is by far the most popular way (Nocedal and Wright, 2000) to update the Hessian matrix: H i+1 = H i + y iy T i y T i δm H iδm i (H i δm i ) T i δm T i H, (6) iδm i where y i = g i+1 g i, δm i = m i+1 m i. Although the BFGS method reduces the computation time required to approximate a Hessian matrix, it does not reduce the O ( N 3) computation time required to use the Hessian matrix to update models (equation 4), nor does it reduce the O ( N 2) memory required to store Hessian approximations. The only way to reduce these costs is to reduce the number N of model parameters. For this purpose, we introduce a projected Hessian matrix in a sparse model space. Tests of our projected Hessian on the Marmousi II model suggest that quasi- Newton FWI may be promising in practical applications. PROJECTED HESSIAN MATRIX In an approach similar to that used in subspace methods (Kennett et al., 1988; Oldenburg et al., 1993) and alternative parameterizations (Pratt et al., 1998, Appendix A), we construct a finely- and uniformly-sampled (dense) model m from a sparse model s that contains a much smaller number n << N of model parameters: m = Rs, (7) where R is an interpolation operator that interpolates model parameters from the sparse model s to the dense model m. FWI is then reformulated as a new sparse optimization problem (Pratt et al., 1998, Appendix A; Ma et al., 2010), in which we minimize a new nonlinear objective function E: R n R, min E (Rs). (8) s Rn In the optimization problem posed in equation 8, model parameters are not determined independently as in equation 1. Instead, the n sparse parameters in s constrain the other N n parameters in m. Therefore, equation 8 is equivalent to a linearly constrained optimization, with N n constraints. In the context of constrained optimization, the projected Hessian matrix is suggested by Gill et al. (1981) and several other authors.

Projected Hessian Rewrite equation 3 as E (Rs i +Rδs i ) E (Rs i )+δs T i RT g i + 1 2 δst i RT H i Rδs i, (9) where we refer to R T H i R as a projected Hessian matrix (Gill et al., 1981), and R T g i as a projected gradient. Here R T denotes the adjoint of the interpolation operator R. We then minimize this approximated E by solving a set of n linear equations: R T H i Rδs i = R T g i, (10) with a solution for the n unknowns ( δs i = R T H i R) R T g i. (11) Therefore, in Newton s method the update direction vector becomes ( p i = R T H i R) R T g i, (12) for a step length α i = 1. The projected Hessian R T H i R is a n n symmetric matrix. Figure 1 describes this projected Hessian in a schematic fashion: the tall and thin rectangle denotes the interpolation operator R, the short and fat rectangle denotes the adjoint operator R T, and the large and small squares denote the original and projected Hessian matrices, respectively. Figure 1 illustrates how memory consumption is reduced from O ( N 2) to O ( n 2). Computation is likewise reduced from O ( N 3) to O ( n 3). R T H i R RHR R T H i R = Hi Figure 1: Schematic for computing a projected Hessian matrix. Projected BFGS method A projected Hessian matrix can be approximated iteratively with an approach similar to the classic BFGS method. Apply the interpolation operator R and its adjoint operator R T to both sides of equation 6 to obtain R T H i+1 R=R T H i R+ RT y i y T i R ( y T i δm RT H i δm i R T ) T H i δm i i δm T i H. iδm i (13) Now simplify equation 13 by defining H i R T H i R, g i R T g i, and ỹ i R T y i = g i+1 g i to obtain an update formula for the projected BFGS method: H i+1 = H i + ỹiỹt i ỹ T i δs i = H ( ) T i δs i H i δs i δs T. (14) i H i δs i Figure 2: The true model and the initial model used in inversion. Equation 14 has the same structure as equation 6, so we refer to this formula for updating the projected Hessian as the projected BFGS method, a projected version of the classic BFGS method in the sparse model space s. Pseudo-code for implementing this projected BFGS method is as follows: given s 0, g 0 = R T g 0, and H 0 = I for i = 0,1,2,..., until convergence do p i = H i g i search for α i, δs i = α i p i m i+1 = m i + Rδs i g i+1 = E(m i+1 ), g i+1 = R T g i+1 ỹ i = g i+1 g i H i+1 = H i + ỹiỹt i ỹ T i δs i end for EXAMPLES H i δs i( H i δs i) T δs T i H i δs i We test our projected BFGS method on the Marmousi II model. Figure 2a shows the true model m, and Figure 2b shows the initial model m 0, a highly smoothed version of the true model. We use 11 shots uniformly distributed on the surface, and a 15Hz Ricker wavelet as the source for simulating wavefields. The source and receiver intervals are 0.76 km and 0.024 km, respectively. In this example, the dense model space has N = 391 1151 parameters. Therefore, either computation or storage of the Hessian matrix for the dense model is infeasible.

Projected Hessian Figure 3: Metric tensor fields depicted by ellipses and structurally constrained sample selections. A total of 165 samples are chosen for our projected BFGS method. Structurally constrained sample selection A properly constructed sparse model s is essential for implementing the projected BFGS method. As suggested by Ma et al. (2011), we construct such a sparse model s by using a structurally constrained sample selection scheme. This selection method considers structures apparent in migrated images. Figure 3a displays a metric tensor field (illustrated by ellipses) computed for a migrated image. As we can see, orientations and shapes of the ellipses in Figure 3a conform to the imaged structure. Sparse samples should be representative, so that image-guided interpolation can reconstruct an accurate dense model m from a sparse model s. In general, we must locate samples between reflectors. We should especially avoid putting samples on reflectors. locate samples along structural features. To reduce redundancy, we should place fewer samples along structural features than across them. Moreover, we should put more samples in structurally complex areas than in simple areas. Figure 3b shows a total of 165 scattered sample locations. The chosen locations together with corresponding values at these locations comprise a sparse model space s that will be used in the projected BFGS method. Projected Hessian and inverse In our projected BFGS method, we employ image-guided in- Figure 4: The Hessian update H 1 H 0 in the 1st iteration and the corresponding inverse Hessian H 1. An initial Hessian H 0 = I is used for the projected BFGS method. terpolation (IGI) (Hale, 2009) as the operator R and the adjoint of image-guided interpolation (Ma et al., 2010) as the operator RT. IGI interpolates values in the sparse model s to a uniformly sampled dense model m, and the interpolant makes good geological sense because it accounts for structures in the model. The Hessian update H 1 H 0 in the first iteration is shown in Figure 4a. As we can see, the projected BFGS method adds significant off-diagonal components to the initial Hessian H0 = I. Because the line search satisfies the strong Wolfe conditions (Nocedal and Wright, 2000), the projected Hessian is symmetric positive-definite (SPD). Therefore, the inverse of a projected Hessian matrix exists. Figure 4b shows the inverse Hessian H 1.

Projected Hessian (c) (d) Figure 5: Update directions in the 1st iteration of the conjugate gradient method and our quasi-newton method. Estimated models in the 10th iteration of the conjugate gradient method (c) and our quasi-newton method (d). Projected quasi-newton FWI The projected BFGS method updates the model mi in each iteration, and therefore the projected BFGS method can be directly used in quasi-newton solutions to FWI. Figure 5a and 5b shows the update direction of the conjugate-gradient and the quasi-newton methods in the 1st iteration, respectively. Compared with Figure 5a, the quasi-newton update direction (Figure 5b) contains significant low wavenumbers. Furthermore, the inverse projected Hessian H i works as a filter applied to the gradient. As a consequence, the amplitudes of the update direction (Figure 5b) are more balanced. the line search must satisfy the Wolfe conditions. Therefore, a further investigation of the coefficients in the Wolfe conditions is necessary. ACKNOWLEDGMENT This work is sponsored by a research agreement between ConocoPhillips and Colorado School of Mines (SST-20090254-SRA). Figures 5c and 5d show estimated models in the 10th iteration of the conjugate-gradient and quasi-newton methods, respectively. Figure 6 shows the data misfit function of the quasinewton FWI, which shows a faster convergence rate than does the conjugate-gradient method. CONCLUSION In this paper we propose a projected BFGS method to iteratively approximate the Hessian matrix in FWI, thereby reducing both computation time and required memory. The projected BFGS method can be used to perform FWI using a quasi-newton method. As demonstrated by the examples above, quasi-newton FWI converges in fewer iterations. Compared with the conjugate gradient method, the primary disadvantage of our projected BFGS method is the computational cost of a single iteration, which is relatively high because Figure 6: Comparisons of the data misfit function between the conjugate gradient method and the quasi-newton method.

Projected Hessian REFERENCES Broyden, C. G., 1970, The convergence of a class of doublerank minimization algorithms: IMA Journal of Applied Mathematics, 6, 222 231. Fletcher, R., 1970, A new approach to variable metric algorithms: The Computer Journal, 13, 317 322. Gill, P. E., W. Murray, and M. H. Wright, 1981, Practical optimization: Academic Press, London. Goldfarb, D., 1970, A family of variable-metric methods derived by variational means: Mathematics of Computation, 109, 23 26. Hale, D., 2009, Image-guided blended neighbor interpolation of scattered data: SEG Technical Program Expanded Abstracts, 28, 1127 1131. Kennett, B., M. Sambridge, and P. Williamson, 1988, Subspace methods for large inverse problems with multiple parameter classes: Geophysical Journal International, 94, 237 247. Ma, Y., D. Hale, B. Gong, and Z. Meng, 2011, Image-guided full waveform inversion: CWP Report. Ma, Y., D. Hale, Z. Meng, and B. Gong, 2010, Full waveform inversion with image-guided gradient: SEG Technical Program Expanded Abstracts, 29, 1003 1007. Nocedal, J., and S. J. Wright, 2000, Numerical Optimization: Springer. Oldenburg, D., P. McGillvray, and R. Ellis, 1993, Generalized subspace methods for large-scale inverse problems: Geophysical Journal International, 114, 12 20. Pratt, R., C. Shin, and G. Hicks, 1998, Gauss-Newton and full newton methods in frequency-space seismic waveform inversion: Geophysical Journal International, 133, 341 362. Shanno, D. F., 1970, Conditioning of quasi-newton methods for function minimization: Mathematics of Computation, 111, 647 656. Tarantola, A., 1984, Inversion of seismic-reflection data in the acoustic approximation: Geophysics, 49, 1259 1266. Virieux, J., and S. Operto, 2009, An overview of fullwaveform inversion in exploration geophysics: Geophysics, 74, WCC1 WCC26.