Seismic data reconstruction with Generative Adversarial Networks

Similar documents
SUMMARY. In this paper, we present a methodology to improve trace interpolation

SUMMARY. In combination with compressive sensing, a successful reconstruction

Randomized sampling strategies

Main Menu. Summary. sampled) f has a sparse representation transform domain S with. in certain. f S x, the relation becomes

Compressed sensing based land simultaneous acquisition using encoded sweeps

A low rank based seismic data interpolation via frequencypatches transform and low rank space projection

Time-jittered ocean bottom seismic acquisition Haneet Wason and Felix J. Herrmann

Randomized sampling without repetition in timelapse seismic surveys

G009 Scale and Direction-guided Interpolation of Aliased Seismic Data in the Curvelet Domain

A Nuclear Norm Minimization Algorithm with Application to Five Dimensional (5D) Seismic Data Recovery

Seismic data interpolation using nonlinear shaping regularization a

Learning to generate with adversarial networks

Beating level-set methods for 3D seismic data interpolation: a primal-dual alternating approach

An Empirical Study of Generative Adversarial Networks for Computer Vision Tasks

A dual domain algorithm for minimum nuclear norm deblending Jinkun Cheng and Mauricio D. Sacchi, University of Alberta

SUMMARY. denoise the original data at each iteration. This can be

arxiv: v1 [eess.sp] 23 Oct 2018

Source Estimation for Wavefield Reconstruction Inversion

Deconvolution with curvelet-domain sparsity Vishal Kumar, EOS-UBC and Felix J. Herrmann, EOS-UBC

Seismic Data Interpolation With Symmetry

Sub-Nyquist sampling and sparsity: getting more information from fewer samples

B023 Seismic Data Interpolation by Orthogonal Matching Pursuit

Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference

Incoherent noise suppression with curvelet-domain sparsity Vishal Kumar, EOS-UBC and Felix J. Herrmann, EOS-UBC

SEISMIC INTERPOLATION VIA CONJUGATE GRADIENT PURSUIT L. Fioretti 1, P. Mazzucchelli 1, N. Bienati 2 1

Geophysical Journal International

Robust Seismic Image Amplitude Recovery Using Curvelets

Successes and challenges in 3D interpolation and deghosting of single-component marinestreamer

An SVD-free Pareto curve approach to rank minimization

arxiv: v1 [cs.cv] 1 Nov 2018

SUMMARY. Pursuit De-Noise (BPDN) problem, Chen et al., 2001; van den Berg and Friedlander, 2008):

Learning Social Graph Topologies using Generative Adversarial Neural Networks

Summary. Introduction. Source separation method

arxiv: v1 [cs.cv] 5 Jul 2017

Progress on Generative Adversarial Networks

Mostafa Naghizadeh and Mauricio D. Sacchi

SUMMARY. These two projections are illustrated in the equation

DCGANs for image super-resolution, denoising and debluring

Seismic wavefield inversion with curvelet-domain sparsity promotion Felix J. Herrmann, EOS-UBC and Deli Wang, Jilin University

arxiv: v1 [cs.cv] 16 Jul 2017

Efficient matrix completion for seismic data reconstruction

Anisotropy-preserving 5D interpolation by hybrid Fourier transform

One-norm regularized inversion: learning from the Pareto curve

Seismic Data Reconstruction via Shearlet-Regularized Directional Inpainting

Seismic data Interpolation in the Continuous Wavenumber Domain, Flexibility and Accuracy

Compressive least-squares migration with source estimation

Stochastic conjugate gradient method for least-square seismic inversion problems Wei Huang*, Hua-Wei Zhou, University of Houston

Image Restoration Using DNN

Seismic data interpolation and de-noising in the frequency-wavenumber domain

Deep Learning With Noise

Full-waveform inversion using seislet regularization a

Introduction to Generative Adversarial Networks

Extending the search space of time-domain adjoint-state FWI with randomized implicit time shifts

Deep Fakes using Generative Adversarial Networks (GAN)

Deep Learning with Tensorflow AlexNet

Extended Dictionary Learning : Convolutional and Multiple Feature Spaces

Visual Recommender System with Adversarial Generator-Encoder Networks

From Maxout to Channel-Out: Encoding Information on Sparse Pathways

Seismic data interpolation beyond aliasing using regularized nonstationary autoregression a

Low-rank representation of extended image volumes applications to imaging and velocity continuation

Image Restoration with Deep Generative Models

Seismic Noise Attenuation Using Curvelet Transform and Dip Map Data Structure

SUMMARY. forward/inverse discrete curvelet transform matrices (defined by the fast discrete curvelet transform, FDCT, with wrapping

A new take on FWI: Wavefield Reconstruction Inversion

Downloaded 09/20/16 to Redistribution subject to SEG license or copyright; see Terms of Use at

SUMMARY. min. m f (m) s.t. m C 1

G009 Multi-dimensional Coherency Driven Denoising of Irregular Data

Novel Lossy Compression Algorithms with Stacked Autoencoders

Five Dimensional Interpolation:exploring different Fourier operators

Least Squares Kirchhoff Depth Migration: potentials, challenges and its relation to interpolation

SYNTHESIS OF IMAGES BY TWO-STAGE GENERATIVE ADVERSARIAL NETWORKS. Qiang Huang, Philip J.B. Jackson, Mark D. Plumbley, Wenwu Wang

What was Monet seeing while painting? Translating artworks to photo-realistic images M. Tomei, L. Baraldi, M. Cornia, R. Cucchiara

Interpolation using asymptote and apex shifted hyperbolic Radon transform

Journal of Applied Geophysics

Deep Learning of Compressed Sensing Operators with Structural Similarity Loss

Large-scale workflows for wave-equation based inversion in Julia

3D pyramid interpolation

Collaborative Sparsity and Compressive MRI

Machine-learning Based Automated Fault Detection in Seismic Traces

Blind Compressed Sensing Using Sparsifying Transforms

A least-squares shot-profile application of time-lapse inverse scattering theory

Hierarchical scale curvelet interpolation of aliased seismic data Mostafa Naghizadeh and Mauricio Sacchi

Downloaded 09/01/14 to Redistribution subject to SEG license or copyright; see Terms of Use at

Rank-Reduction Filtering In Seismic Exploration. Stewart Trickett Calgary, Alberta, Canada

Stochastic Simulation with Generative Adversarial Networks

Bilevel Sparse Coding

SUMMARY THEORY. L q Norm Reflectivity Inversion

Pseudo-Random Number Generation using Generative Adversarial Networks

arxiv: v1 [cs.cv] 7 Mar 2018

Channel Locality Block: A Variant of Squeeze-and-Excitation

Full-waveform inversion with randomized L1 recovery for the model updates

Deep Learning for Visual Manipulation and Synthesis

Restricted Boltzmann Machines. Shallow vs. deep networks. Stacked RBMs. Boltzmann Machine learning: Unsupervised version

Generative Adversarial Network

Non-stationary interpolation in the f-x domain

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

A Novel Image Super-resolution Reconstruction Algorithm based on Modified Sparse Representation

Th C 02 Model-Based Surface Wave Analysis and Attenuation

Common-angle processing using reflection angle computed by kinematic pre-stack time demigration

INTRODUCTION. Model: Deconvolve a 2-D field of random numbers with a simple dip filter, leading to a plane-wave model.

Transcription:

Seismic data reconstruction with Generative Adversarial Networks Ali Siahkoohi 1, Rajiv Kumar 1,2 and Felix J. Herrmann 2 1 Seismic Laboratory for Imaging and Modeling (SLIM), The University of British Columbia, 2 Georgia Institute of Technology Abstract A main challenge in seismic imaging is acquiring densely sampled data. Compressed Sensing has provided theoretical foundations upon which desired sampling rate can be achieved by applying a sparsity promoting algorithm on sub-sampled data. The key point in successful recovery is to deploy a randomized sampling scheme. In this paper, we propose a novel deep learning-based method for fast and accurate reconstruction of heavily under-sampled seismic data, regardless of type of sampling. A neural network learns to do reconstruction directly from data via an adversarial process. Once trained, the reconstruction can be done by just feeding the frequency slice with missing data into the neural network. This adaptive nonlinear model makes the algorithm extremely flexible, applicable to data with arbitrarily type of sampling. With the assumption that we have access to training data, the quality of reconstructed slice is superior even for extremely low sampling rate (as low as 10%) due to the data-driven nature of the method.

Introduction Densely regularly sampled seismic data yields higher resolution images using current imaging techniques. Sampling wavefields at Nyquist-rate is expensive and time consuming given the large dimensionality and size of seismic data volumes. In addition, fully-sampled data recovered from equally-spaced undersampled data may suffer from residual aliasing artifacts. Compressed Sensing methods recover the fully-sampled data form highly incomplete sets of linear measurements using sparsity promoting iterative solvers. The underlying assumption in sparse signal recovery is that data with missing samples is less sparse than the fully-sampled data, in some transform domain. This requires a randomized sampling scheme to break the sparsity in the transform domain. There has been efforts to randomize the acquisition of seismic data (Wason and Herrmann, 2013) and it requires special acquisition design. This means vintage data that already has been shot does not satisfy the underlying assumption in sparse signal recovery. We can categorize the previous methods for seismic data reconstruction as follows. Some methods use a transform domain in which the fully-sampled data is sparse. They recover the signal using sparsitypromoting methods (Sacchi et al., 1998; Herrmann and Hennenfent, 2008; Hauser and Ma, 2012). The redundancy of seismic data can also be exploited when data is reshaped into a matrix. Oropeza and Sacchi (2011) and Kumar et al. (2013) used rank minimization techniques to reconstruct the missing values. Da Silva and Herrmann (2014) and Kreimer et al. (2013) used tensor completion methods to reconstruct seismic data. And finally dictionary learning methods have also been utilized to recover missing values in seismic data (Zhu et al., 2017b; Yarman et al., 2017). The above mentioned methods, rely on some perhaps too simplifying assumptions on data by formulating linear mathematical models that explain the data either as a superposition of prototype waveforms from a fixed or learned dictionary or in terms of a matrix factorizations. Although data-adaptive schemes such as dictionary learning and matrix factorizations have been successful, these approaches because of their linearity may fail to capture information hidden in (seismic) data. By giving up linearity and using adaptive nonlinear models, combined with insights from game theory, we arrive at a new formulation for missing-trace interpolation where the signal model is no longer linear i.e., the data is no longer considered as a superposition of basic elements. While there is an abundance of nonlinear signal models, not in the least ones that are defined in terms of the wave equation itself that is nonlinear in the medium properties, we propose the use of deep convolutional neural networks (CNNs, Krizhevsky et al., 2012) and in particular the state-of-the-art Generative Adversarial Networks (GANs, Goodfellow et al., 2014). GANs are able to learn to sample from high dimensional complex probability distributions without making strong assumptions on the data. Nonlinear models outperform linear approximations to the complex nonlinear real-world models because they can capture the nonlinearities of observed data. On the other hand, the formulation of an appropriate nonlinear function to model a particular phenomena requires knowledge of underlying data generation process that can be unknown. In principle, wave equations are the underlying ideal nonlinear data generation process for seismic data provided these equations capture the full wave physics, which is not always the case for various reasons. Based on universal approximation theory (Csáji, 2001), neural networks can approximate any continuous function under some mild assumptions. This motivates us to use neural networks, in particular, GANs with a probabilistic approach, to implicitly model the complex nonlinear probability distribution of the data, rather than the nonlinear physical model itself. By deploying this model and with access to training dataset, we hope to outperform current methods for large percentages of traces missing and without imposing constraints on the type of (random) sampling. Generative Adversarial Networks Suppose we have a set of samples S X = {x i } N i=1 drawn from an initial distribution p X(x) and a set of samples S Y = {y i } N i=1 drawn from a target distribution p Y (y), where N is the number of samples in each set. The goal is to learn a mapping G : X Y. In other words, not only map S X to S Y, but also map all the samples from p X (x) to samples from p Y (y). GANs consist of two deep CNNs, namely the generator (G ), and the discriminator (D), which can achieve the goal of training the generative network via an adversarial game. The game is described as follows: D : R m n [0,1] estimates the probability that a sample came from the p Y (y) rather than from the of output of G. The generator G maps p X (x) to p Y (y) such that it s output is difficult for D to distinguish from the samples in p Y (y). Let θ (G ) and θ (D)

be the set of parameters for networks G and D, respectively. We can describe the GAN objective as a combination of the generator and the discriminator objective functions as follows (Mao et al., 2016): [ L G = E (1 D θ (D) (G θ (G )(x))) 2], x p X (x) [ L D = E (D θ (D) (G θ (G )(x))) 2] [ + E (1 D θ (D) (y)) 2] (1), x p X (x) y p Y (y) where x and y are random samples drawn from distributions p X (x) and p Y (y), respectively. The G and D are differentiable with respect to x, y, θ (G ), and θ (D) but the GAN objective is highly nonlinear and non-convex because of the network structure used for G and D, as described later. In practice, we do not have access directly to distributions p X (x) and p Y (y). Instead we have sets of samples (e.g. frequency slices each corresponding to a shot) for each of them, S X and S Y. In order to have training data we rely on the assumption that either we have some fully-sampled shots at random locations, or we can use training data from one part of the model for the rest. If we define p X (x) and p Y (y) to be the distribution of partial measurements and fully sampled data, respectively, we can enforce the network to do reconstruction by minimizing the following reconstruction objective over θ (G ) : L reconstruction = E x p X (x), y p Y (y) [ G (x) y 1 ], (2) where x and y are paired samples i.e., given x the desired reconstruction is y. We observed that using l 1 norm as a misfit measure in the reconstruction objective gives a better recovery SNR. A similar choice has been made by Isola et al. (2016). So we can summarize the game for training GAN by combining reconstruction objective with GAN objective as follows: min{l G + λl reconstruction }, θ (G ) (3) min{l D }. θ (D) To solve this problem we approximate the expectations by taking random paired subsets of samples from S X and S Y, without replacement, i.e., picking random subsets of paired sub-sampled and fully-sampled frequency slices, and alternatively minimizing L G and L D over θ (G ) and θ (D), respectively. By using the optimized parameters, we can map p Y (x) to p Y (y) using the generator network (Goodfellow et al., 2014). After extensive parameter testing, we found that λ = 100 is the appropriate value for this hyper-parameter that controls the relative importance of the reconstruction objective. Excellent performance of CNNs in practice (Krizhevsky et al., 2012) suggests that they are reasonable models to use for our generator and discriminator. For completeness, we briefly describe the general structure of CNNs the network architecture used for both our generator and discriminator that is similar to choices made by Zhu et al. (2017a) with minor changes to fit our frequency slices dimensions. In general, for given parameters θ, CNNs consist of several layers where within each layer the input is convolved with multiple kernels, followed by either up or down-sampling, and the elementwise application of a nonlinear function (e.g. thresholding,... ). Network architecture, including the number of layers, dimension and number of convolutional kernels in each layer, the amount of up or down-sampling, and nonlinear functions used, is designed to fit the application of interest. After deciding on the architecture, the set of parameters θ for the designed CNN consists of the convolution kernels in all the layers of the CNN. During training, In optimization problem 3, for given input and output pairs, θ gets optimized. GANs for data reconstruction For doing seismic data reconstruction with the above framework, we define set S X to be the set of frequency slices, at a certain frequency band, with missing entries. We reshape each frequency slice with size 202 202 into a three dimensional tensor with size 202 202 2 so the real and imaginary parts of the slice are separated. Similarly, we define S Y to be the set of corresponding frequency slices but with no missing values. We stop the training process when we reach our desired recovery accuracy on the training dataset. Once training is finished, we can reconstruct frequency slices not in training dataset by just feeding the frequency slice with missing data into G. The network evaluation is extremely fast as hundreds of milliseconds. We implemented our algorithm in TensorFlow. For solving the optimization problem 3, similar to Zhu et al. (2017a), we use the Adam optimizer with the momentum parameter β = 0.9, random batch size 1, and a linearly decaying learning rate with initial value µ = 2 10 4 for both the generator and discriminator networks.

Experiments The frequency slices used in this work are generated using finite differences from the 3D Overthrust model. The data volume has 202 202 receivers and 102 102 sources sampled on a 25 m grid. We extract the 5.31 Hz frequency slice. For training dataset we tried to choose as little randomly selected slices as possible to achieve desired recovery accuracy. In our first experiment, we defined S X to be the set of 250 slices with 90% randomly missing entries (varying sampling mask) and 250 column-wise missing entries with the same sampling rate. In the second experiment, in order to show the capability of our method to recover from large contiguous areas of missing data (e.g. because of a platform), we recover frequency slices that miss half of their samples as a square in the middle. In this case S X consists of 2000 slices. Figure 1 shows the recovery for the real part of a single slice, for these experiments. The average reconstruction SNR of slices not in the training dataset is shown in the table 1. Based on the results obtained and the number of training samples used in second experiment, we can observe that the quality of reconstruction is lower when we have large gaps in the observed data. Since other methods have not reported success in reconstructing this type of missing data, the recovery is reasonable. Missing type Missing percentage Average recovery SNR (db) Column-wise 90% 33.7072 Randomly 90% 29.6543 Square 50% 20.6053 Table 1: Average reconstruction SNR. Discussion and conclusions We introduced a deep learning scheme for reconstruction of heavily sub-sampled seismic data. By giving up linearity and using an adaptive nonlinear model, we are able to reconstruct seismic data with arbitrarily type of sampling that no other method is capable of. Although in order to achieve this result, we assumed that training data is available, which requires having a small percentage of shots fully-sampled. In the first experiment we showed that by recording 5% of shots in random locations with a desired sampling rate, we are able to reconstruct all the other slices (in some frequency) with 90% randomly or column-wise missing entries. In the next step, we are looking forward to incorporate this scheme in multiple prediction. Acknowledgements This research was carried out as part of the SINBAD II project with the support of the member organizations of the SINBAD Consortium. References Csáji, B.C. [2001] Approximation with artificial neural networks. Faculty of Sciences, Etvs Lornd University, Hungary, 24, 48. Da Silva, C. and Herrmann, F.J. [2014] Low-rank Promoting Transformations and Tensor Interpolation- Applications to Seismic Data Denoising. In: 76th EAGE Conference and Exhibition 2014. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A. and Bengio, Y. [2014] Generative Adversarial Nets. Advances in neural information processing systems, 2672 2680. Hauser, S. and Ma, J. [2012] Seismic data reconstruction via shearlet-regularized directional inpainting. Herrmann, F.J. and Hennenfent, G. [2008] Non-parametric seismic data recovery with curvelet frames. Geophysical Journal International, 173(1), 233 248. Isola, P., Zhu, J.Y., Zhou, T. and Efros, A.A. [2016] Image-to-image translation with conditional adversarial networks. arxiv preprint arxiv:1611.07004. Kreimer, N., Stanton, A. and Sacchi, M.D. [2013] Tensor completion based on nuclear norm minimization for 5D seismic data reconstruction. Geophysics, 78(6), V273 V284. Krizhevsky, A., Sutskever, I. and Hinton, G.E. [2012] Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems. 1097 1105. Kumar, R., Aravkin, A.Y., Mansour, H., Recht, B. and Herrmann, F.J. [2013] Seismic data interpolation and denoising using svd-free low-rank matrix factorization. In: 75th EAGE Conference & Exhibition incorporating SPE EUROPEC 2013. Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z. and Smolley, S.P. [2016] Least squares generative adversarial networks. arxiv preprint ArXiv:1611.04076.

(a) (b) (e) (h) (c) (f) (i) (d) (g) (j) Figure 1: Reconstruction of a frequency slice with arbitrarily type of sampling. a) True, b) 90% missing column-wise, c) 90% missing randomly, d) 50% missing as a square in the middle, e, f, g) Reconstructed slices corresponding to {b, c, d}, h, i, j) Difference corresponding to {e, f, g}. Oropeza, V. and Sacchi, M. [2011] Simultaneous seismic data denoising and reconstruction via multichannel singular spectrum analysis. Geophysics, 76(3), V25 V32. Sacchi, M.D., Ulrych, T.J. and Walker, C.J. [1998] Interpolation and extrapolation using a high-resolution discrete Fourier transform. IEEE Transactions on Signal Processing, 46(1), 31 38. Wason, H. and Herrmann, F.J. [2013] Time-jittered ocean bottom seismic acquisition. In: SEG Technical Program Expanded Abstracts 2013, Society of Exploration Geophysicists, 1 6. Yarman, C.E., Kumar, R. and Rickett, J. [2017] A model based data driven dictionary learning for seismic data representation. Geophysical Prospecting. Zhu, J.Y., Park, T., Isola, P. and Efros, A.A. [2017a] Unpaired image-to-image translation using cycleconsistent adversarial networks. arxiv preprint arxiv:1703.10593. Zhu, L., Liu, E. and McClellan, J.H. [2017b] Joint seismic data denoising and interpolation with doublesparsity dictionary learning. Journal of Geophysics and Engineering, 14(4), 802.