GPGPU-Aided 3D Staggered-grid Finite-difference Seismic Wave Modeling

Size: px
Start display at page:

Download "GPGPU-Aided 3D Staggered-grid Finite-difference Seismic Wave Modeling"

Transcription

1 GPGPU-Aided 3D Staggered-grid Finite-difference Seismic Wave Modeling Chang Cai 1, Haiqing Chen 2, Ze Deng 1, Dan Chen 1, *, Samee U. Khan 3, Ke Zeng 1, Minxiao Wu 1 1 School of Computer Science China University of Geosciences Wuhan, China, Postcode: School of Optoelectronics Hankou University Wuhan, China, Postcode: Department of Electrical and Computer Engineering North Dakota State University North Dakota, US * Corresponding author s chendan@pmail.ntu.edu.sg Abstract Finite difference is a simple, fast and effective numerical method for seismic wave modeling, and has been widely used in forward waveform inversion and reverse time migration. However, intensive calculation of three-dimensional seismic forward modeling has been restricting the industrial application of 3D pre-stack reverse time migration and inversion. Aiming at this problem, in this paper, a parallelized 3D Staggered-grid Finite-difference has been developed using General-purpose computing on the graphics processing unit (GPGPU), namely G-3DFD, since the emergence of graphic processing units (GPU) as an effective alternative to traditional general purpose processors has become increasingly capable in accelerating large-scale scientific computing. We analyze three-dimensional staggered grid finite difference method for the implementation on GPU, making possible the industrial application of 3D pre-stack reverse time migration and inversion. Experiments show that G-3DFD has dramatically improved the runtime performance 88 times on modern GPGPU platforms comparing to the original CPU implementation methods. Keywords: Seismic Wave Modeling; 3D Stencil Computation; GPGPU; Parallel Computing I. INTRODUCTION Finite-difference (FD) techniques in the time domain (FDTD) are widely used to solve wave equations such as Maxwell s equations [1] or the seismic wave equation [2] and have been used to solve Navier-Stokes equations [3] as well. A thorough review on FD applied to the seismic wave equation is described in [4]. When more geometrical flexibility is needed for instance to handle geometrically complex models other techniques such as a pseudo-spectral technique [5][6], a boundary-element method [7][8], a spectral-element method [11] or a discontinuous Galerkin technique [12][13] is needed because of its simplicity. FD is often used in conjunction with a perfectly matched layer (PML) to absorb waves on the artificial edges of the numerical grid to mimic an infinite or semi-infinite medium [14] or a Convolution PML (CPML) can be used to improve the behavior of the discrete PML for waves impinging the artificial edges of the grid at grazing incidence [16][17] for unspotted CPMLs for seismic waves. More recently a formulation of the unspotted CPML that can easily be extended to higher-order time schemes, called the auxiliary differential equation PML (ADE-PML), has been introduced by [18] for Maxwell s equations and by [19] for the seismic wave equation. An improved sponge layer, improperly called the split Multiaxial PML (M-PML), has been suggested by [20], but the perfectly matched character of Bérenger is lost because of the coupling introduced between derivatives along multiple grid axes. In recent years, graphics processing unit (GPU) has been widely used to accelerate general-purpose applications. A number of physical computing problems have been solved using GPU, e.g. molecular dynamics simulations, fluid dynamics simulations and astrophysical calculations [21], etc. Regarding FD, several applications have been ported to GPUs. Some spectral-element methods are proposed on a single GPU platforms in [6] while some methods are based on multiple GPUs in [22][23]. GPU programming on NVIDIA graphics cards has become significantly easy with the introduction with CUDA programming language, which is relatively easy to learn because its syntax is similar to C. Regarding FD for seismic reverse time migration in the case of an acoustic medium with constant density, Abdelkhalek and Micikevicius (from NVIDIA corporation, the developers of GPU hardware and of CUDA) have recently introduced optimized implementations. Abdelkhalek extended it to the acoustic case with heterogeneous density [24]. In this study, we use CUDA to solve the 3D Staggered-grid Finite-difference Seismic Wave Modeling in a more complex fully heterogeneous environment in parallel. We propose some new techniques to adapt to the programming model of CUDA, including kernels, thread hierarchy and memory hierarchy. Since the number of points is very huge, we use one thread to process points in a row, this method not only solves the problem of limited resource on GPU, but also easily achieves

2 the optimization of Memory Coalesced to decrease the access times of global memory and fully exploit limited shared memory in GPU hardware. GPU has much more cores and greater compute capability than CPU, so in our program, all calculation works are run on GPU while the memory management and result output are executed on CPU. Then we finish the 3D Staggered-grid Finite-difference Seismic Wave Modeling respectively using C running on CPU and CUDA program running on GPU. Performance comparisons between them show that GPU is greatly suitable for parallelism computation especially for highly compute-intensive issue and the speedup we get ranges from 57 to 88, which reduces the work time from one hour to half a minute and makes the industrial application of 3D pre-stack reverse time migration and inversion possible. Step 2: In processing the tth time, differential of the stress along the spatial direction is computed, update values of velocities along the surface. Step 3: Surface differential of the velocity field along the axis direction is calculated, update values of stress along the surface. Step 4: Check if t>max: when termination condition is not satisfied, Step 4 will be returned with t=t+1 and add the wavelet source into simulation to Step 2; Otherwise, jump to Step Start Initiate II. METHODS t=t+1 Differential of the stress along the spatial direction A. 3D Staggered-grid Finide-difference This section describes the formulation of the AWP-ODC numerical model and the analysis of the computation steps. As stated in paper [25], AWP-ODC solves a 3D velocity-stress wave equation using a staggered-grid finite difference method to achieve fourth-order accurate in space and second-order accurate in time. The coupled system of partial differential equations includes the particle velocity vector ν and the symmetric stress tensor σ [26]. Suppose: σ σ σ σ σ σ σ (1) σ σ σ Then the governing elastodynamic equations are [45]: σ (2) σ (3) Where and are the lame coefficient and is the constant density. Simplifying formula (2) and (3) leads to three scalar-valued equations for velocity vector components and three scalar-valued equations for the stress tensor components. Three equations are listed below including one for velocity vector equation and two for stress equations: σ σ σ (4) σ σ AWP-ODC is a memory-intensive application and twenty-one 3D variable arrays are involved in the main computation loop. In addition to the three velocity vector components and six symmetric stress tensor components, 6 temporary variables (r1, r2, r3, r4, r5, r6) and 6 constant coefficients (λ, μ, ρ, S for quality factor wave Qs and P wave Qp, boundary condition variable Cerjan Cj [27]) are used in the numerical modeling. Fig.1 presents the pseudo-code of the computation kernels in the main loop for formula (4) and (5). The algorithm mainly includes 5 steps: Step 1: This step initializes the original model's parameters, sets the 3D grid's size and the time step and calculates the wavelet for simulation. (5) (6) 3 Add wavelet source 5 4 N Velocity Compution surface differential of the velocity field along the axis direction Strss Compution t>tmax? N OutPut Result Fig. 1. AWP-ODC algorithm Step 5: Output the results of simulation. The 13 point asymmetric stencil computation for v x in AWP-ODC main loops reads occur 12 times in three different 3D arrays and writes occur only once in one 3D array. Stress component calculation includes two 1D asymmetric stencils, with only 4 reads from a single 3D array and no writes. TABLE I. summarizes the analysis of kernels in Fig.1. It shows that AWP-ODC is a memory-bound application because of the low FLOPS to bytes ratio (the average operation intensity is around 0.5), which means the application has poor temporal locality and the performance is dominated by the memory system or arithmetic throughput [28]. TABLE I. ANALYSIS OF COMPUTATION KERNELS IN AWP-ODC MAIN LOOP Kernels Reads Writes FLOPS FLOPS/Bytes Velocity Computation Kernel Stress Computation Kernel Total End

3 A. General-purpose Computing on Graphics Processing Units (GPGPU) and Compute Unified Device Architecture (CUDA) Graphics processing unit (GPU) is specialized for compute-intensive and highly parallel computations [29], especially those problems that can be expressed as data-parallel computations (e.g., the Matrix multiplication). Owing to their highly-parallel architecture, modern GPUs are capable of a theoretical peak performance that is an order of magnitude higher than mainstream CPUs [30]. This feature together with the high performance-to-price ratio and widespread availability have propelled GPUs to the forefront of high-performance computing which are now accessible even on most commodity computers. General-purpose Computing on Graphics Processing Units (GPGPU) has recently boomed with the enhanced programmability of GPUs rather than having to depend on standard graphics APIs or shadier language, which forces the developer to re-formulate algorithms in graphics metaphors, general purpose programming languages. Grid x Grid y Block (0, 0) Block (n, 0) Block (0, m) Block (n, m) Block (n, m) Thread (0, 0) Thread (i, 0) Thread (0, j) Thread (i, j) Global Memory for all Grids Per-block Shared Memory block (n, m) Per-thread Local Memory for thread (i, j) Fig. 2. Hierarchy of Threads and Memory in CUDA Relatively high abstraction levels have emerged, and NVIDIA s Compute Unified Device Architecture (CUDA) is a salient example. The current CUDA fosters a software environment which allows programming on GPU with a slightly extended version of C, therefore a CUDA-based application can execute on the GPU in a massively parallel fashion. A CUDA application defines C functions as kernels to be explicitly executed on the GPU (referred to as the device ). Normally the kernels are invoked from the main executable of the application on the CPU (the host ), which will be executed for N times via N individual CUDA threads. As illustrated in Fig. 2, CUDA threads are organized in one to three-dimensional manner, where threads are grouped to blocks and blocks are again grouped into grids. Threads in a block can cooperate together by efficiently sharing data through a fast shared memory and synchronizing their execution to coordinate memory access. Threads, memories and synchronization are exposed to developers for very fine-grained data and thread parallelisms, e.g., dealing with a single variable of float type per thread. Application developers should partition the problem into a set of fine-grained subtasks which can be mapped to threads accordingly. A more detailed introduction to GPGPU and CUDA can be seen in [29]. III. PARALLELIZATION A. GPGPU-aided 3D Staggered-grid Finide-difference This study focus on a GPGPU-aided implementation of the 3D Staggered-grid Finite-difference method. The main difficulty when implementing a finite-difference code on a GPU comes from the highly compute-intensive. For a fourth-order spatial operator, the thread that handles the calculation of point (i, j, k) needs to know the fields (and therefore access the arrays) at points: (i, j, k), (i + 1, j, k), (i + 2, j, k), (i 1, j, k), (i 2, j, k), (i, j + 1, k) and so on, including the point that it handles and its neighboring points. This implies that 13 accesses to global memory are needed on the GPU to handle each grid point, which causes a high access overhead due to the longer access delay of global memory. However threads that belong to the same block can access data in faster shared memory, it is possible to significantly reduce this number of memory accesses per grid point and thus drastically improve performance. Since the number of threads and the size of shared memory per block run at the same time are limited, and the number of threads points is growing with an order of O(n 3 ) as the growth of grid size, there is not enough shared memory to allow one to use 3-D blocks large enough to sufficiently reduce this ratio, and therefore this approach cannot be efficiently implemented, so we have to split the 3D model to 2D approach. We therefore turn to a more efficient 2-D approach, which increases the computation workload of each thread to reduce the number of threads. We can parallelize the 3D Staggered-grid Finite-difference application at the grid level. As indicated in Fig.3, each grid point of 3D Staggered-grid Finite-difference is mapped to an individual task, but the number of points is too big and the workload of each point is too small, so one row data point mapping to one thread task is the right design. And also we design the program accessing the global memory in coalesced way. One row data unit mapping to one thread task Size of Grid Point A point (Data unit) Size of Grid Point Fig. 3. Parallelization at threads level, one row data mapping to one thread task Full utilization of on-chip memory is the key to achieve high performance for GPU-based applications. Our optimization mechanisms can be summarized into four steps to achieve this goal: Step 1: Read-Only Memory Cache: Texture and Constant memory in the GPU device has its own cache. As discussed

4 above, six 3D variable arrays hold constant coefficients. We put them into the texture memory to take advantage of the texture cache. All scalar constants are put into the constant memory to save registers. The other 15 arrays are stored in global memory because their values are updated during iteration. Step 2: 2D Domain Decomposition: CUDA is an extended language of C/C++, so memory storage for 3D arrays will be fast along the z axis while be slow along x axis. To obtain a better cache hit rate and allow all threads in the same wrap to access data along the fast axis z instead of the slow axis x, we decompose the 3D Grid only along axis y and axis z. Each thread will calculate the entire NX for a given 2D (y, z) location as shown in Fig.4. Since the thread block is also a 3-dimension structure and presents the best performance in the x direction, threads in the x direction must correspond to the 3D Grid z direction, while threads in the y direction correspond to the 3D Grid y direction. Since 32 threads in a wrap are executed at the same time, the number of threads in the x direction is considered to be a multiple of 32 for better performance. Threads(tx,ty,1) Ny=NY/ty Nz=NX/tx NZ Step 1: Initialization phase. This step sets the size of grid and defines the boundaries of seismic wave model. Thus we can obtain the size of each parameter, and allocate the exact size of GPU memory for next step's computation. Step 2: After initialization, this step deals with the computation on GPU, a two-dimensional thread block is defined to identify the CUDA threads. After that, the algorithm focus on PML boundary set, Wave field variable initialization, the source sequence generation START Initiat(nMax,Vx,Vy,Vz,P,Bx,By,Bz,Pla, Rho,Source,Eponge) Beginning the computation on GPU: the algorithm focus on PML boundary set, Wave field variable initialization, the source sequence generation. Wavelet<<<block_size,thread_size>>> (nmax,dt,source,lwav); MODEL(RLA,RHO,BX,BY,BZ,DTDX,NX,NY,NZ); Seismic Wave Modeling on GPU Momentum<<<GridSize,BlockSize>>> (vx,vy,vz,bx,by,bz,p); Loose_ends_k<<<GridSize,BlockSize>>> (vx,vy,bx,by,p); Loose_ends_j<<<GridSize,BlockSize>>> (vx,vy,bx,by,p); Loose_ends_i<<<GridSize,BlockSize>>> (vx,vy,bx,by,p); Parallel Tasks Parallel Tasks Results Device end, independent from the host In parallel Thread block with a unique ID associating with each thread Ny Nx Chunk(NX,Ny,Nz)for GPU Blocks NX 3D Grid (NX,NY,NZ) NY A point (Data unit) Fig. 4. Decomposition for GPU Kernels Step 3: Global Memory Coalesced: For maximum performance, memory access need to be coalesced in global memory. CUDA provides two functions cudamallocpitch and cudapitchedptr to help ensure aligned memory access. However, instead of using these library functions, we manually pad zeros onto the boundaries in the z axis of the 3D grid to align memory to the inner region. The padding size plus NZ should also be a multiple of 32, and the two layers of ghost cells are included in the padding. Step 4: Memory Management: GPU memory is allocated and managed by CPU. Memory optimizations are the most important area for performance optimization. Minimizing data transfer between the host and the device is an effective way. Our method is to batch multiple small transfers into one transfer to significantly reduce the communication costs between GPU and CPU. Since GPU has limited memory resources, we design the program with only one transfer from CPU to GPU during the whole computation procedure and reuse the GPU memory to make the program calculate much larger 3D seismic wave modeling than original methods. Since GPU has more cores and compute faster than CPU, all calculations are done simultaneously by GPU except memory copy, see Fig.5 for more details: END Fig. 5. The flow of the GPGPU-Aided 3D Staggered-grid Finite-difference Step 3: This step calculates the Seismic Wave Modeling on GPU to speed up the calculation of wave field, including space differential calculation, matrix or vector product and sum. Meanwhile, in order to save and map results, the needed results from GPU must be copied back to the CPU. Of course, we can use GPU rendering technology directly to the output wavelength snapshot as well. IV. EXPERIMENT AND PERFORMANCE COMPARISONS The new CUDA method implementation has been carefully validated for correctness. We compare our version of GPU with the traditional method base on CPU. The results are shown in Fig.6. The first 200 time steps of the stress of two methods are recorded at the source point. It can be seen that the results are almost identical with tiny differences caused by different programming languages and compilers. 1 x x 10 4 Fig. 6. Accuracy comparison of waveforms computed on GPU and CPU On a single GTX680 card with memory of 2GB, in order to compute as large size as possible, we use a simple model. It has nine 3D grid parameters. By trying to allocate increasingly

5 larger chunks of memory, almost 1.9 GB memory is available to the user. The largest 3D model that we can use has a size of 37.6km 37.6km 37.6km discretized using a grid of points, that is with grid cells of size 100m in the three spatial directions. To test the performance of the CUDA code running on GPU, we compare it with the performance of the C code running on CPU. The GPU card we use is NVIDIA GTX 680 and the CPU is Intel(R) Core(TM) i We compile the C code in VS2008, and the code for the GPU with the standard NVIDIA nvcc compiler of CUDA version 5.0. To investigate the impacts of different factors on the proposed method, we use a few different measurements on CPU and GTX680 card. TABLE II. is the performance comparisons between GPU and CPU, all benchmark experiments run for 200 time steps and the measurement is focused on the whole time used during the modeling. From TABLE II. we can observe that: 1) With the growing of grid size, the amount of computation is on exponential growth. The time used by CPU to calculate the size larger than 370*370*370 is more than an hour. It's a big challenge for us to compute large 3D gird Seismic Wave Modeling on CPU. 2) GPU is good at fine-grained computation especially matrix operations. It only costs half a minute for GPU to compute 368*368*368 3D grid, which is 88-fold accelerated than CPU. 3) Even though GPU can provide an amazing speedup, the resource on GPU is limited, so the size we can calculate and the speedup we can get depend on heavily GPU resources. TABLE II. shows that when the grid size is less than 368*368*368, the speedup is increasing with the size growth, but when the 3D size is 376*376*376 the speedup only reaches 84, the reason is that the limited GPU resources has been used up. TABLE II. PERFORMANCE COMPARISON BETWEEN CPU AND GPU IN 200 TIME STEPS Grid Size Time(s) CPU(i7) GPU Speedup 128*128* *192* *256* *368* *376* From the analysis above, under the premise of correctness, a single GPU is able to bring more than 80 times speedup, which gives us a perfect alternative for solving highly compute-intensive algorithm and problem. V. CONCLUSIONS AND FUTURE WORK In this study, the feasibility and efficiency of 3D Staggered-grid Finite-difference for seismic wave modeling approach was explored. Finite difference is a simple, fast and effective numerical method for seismic wave modeling, and has been widely used in forward waveform inversion and reverse time migration. However, intensive calculation of three-dimensional seismic forward modeling has been restricting the industrial application of 3D pre-stack reverse time migration and inversion. In order to satisfy the intensive calculation of three-dimensional seismic forward modeling, this study proposes an approach which uses CUDA with thousands of threads, the complex matrix operations can be divided into simple and easy calculations which map the thread's structure exactly, i.e., each thread only calculates one element of the result matrix. With several optimization technologies used, we can get much more obvious promotion. Experiments have been carried out to evaluate the performance of the parallel variants of 3D Staggered-grid Finite-difference using examples of different size simulation. We have shown that the GPU code is accurate by comparing our results to results obtained running the same simulation on a CPU core. We have accelerated a 3-D finite-difference wave propagation code by a factor of from 57 to 88 compared to a serial implementation using one NVIDIA GPU graphics cards and the CUDA programming language, which reduces the simulation time from about one hour into half a minute. As the size growing even larger, more efficient methods must be taken into account, such as multi-gpu program or redesign algorithm, which one is better, our future work will focus on this. ACKNOWLEDGMENT This work was sponsored in part by the Hundred University Talent of Creative Research Excellence Programme (Hebei, China), the National Natural Science Foundation of China (grants No ), the Programme of High-Resolution Earth Observing System (China), the Fundamental Research Funds for the Central Universities (No.CUGL100608, No.CUGL and G , China University of Geosciences, Wuhan), Specialized Research Fund for the Doctoral Program of Higher Education ( ), and the Program for New Century Excellent Talents in University (grant No. NCET ) REFERENCES [1] K.S. Yee, Numerical solution of initial boundary value problems involving Maxwell s equations, IEEE Transactions on Antennas and Propagation, vol. 14, no. 3, pp , [2] J. Virieux, P-SV wave propagation in heterogeneous media: velocity stress finite-difference method, Geophysics, vol. 51, no. 4, pp , 1986 [3] A. J. Chorin, Numerical solution of the Navier-Stokes equations, American Mathematical Society, vol. 22, no. 104, pp , [4] P. Moczo and J. Robertsson, L. Eisner, The finite-difference time-domain method for modeling of seismic wave propagation, Advancesin in Geophysics, vol. 48, no. 8, pp , [5] J. M. Carcione and P. J. Wang, A Chebyshev collocation method for the wave equation in generalized coordinates, Computers and Fluid Dynamics Journal, no. 2, pp , [6] D. Komatitsch, F. Coutel and P.Mora, Tensorial formulation of the wave equation for modelling curved interfaces, Geophysical Journal International, vol. 127, no. 1, pp , [7] H. Kawase, Time-domain response of a semi-circular canyon for incident SV, P and Rayleigh waves calculated by the discrete wavenumber boundary element method, Bulletin of the Seismological Society of America, vol. 78, pp , [8] R. Vai, J. M. Castillo-Covarrubias, F. J. Sánchez-Sesma, D. Komatitsch and J. P. Vilotte, Elastic wave propagation in an

6 irregularly layered medium, Soil Dynamics and Earthquake Engineering, vol. 18, no. 1, pp , [9] Q. Liu, J. Polet, D.Komatitsch and J. Tromp, Spectral-element moment tensor inversions for earthquakes in Southern California, Bulletin of The Seismological Society of America-BULL SEISMOL SOC AMER, vol. 94, no. 5, pp , [10] E. Chaljub, D. Komatitsch, J. P. Vilotte, Y. Capdeville, B. Valette and G. Festa, Spectral element analysis in seismology, Advances in Wave Propagation in Heterogeneous Media, vol. 48, pp , [11] J. Tromp, D. Komatitsch and Q. Liu, Spectral-element and adjoint methods in seismology, Communications in Computational Physics, vol. 3, no. 1, pp. 1-32, [12] W. H. Reed and T. R. Hill, Triangular mesh methods for the neutron transport equation, Alamos Scientific Laboratory, Los Alamos, USA, [13] M. J. Grote, A. Schneebeli and D. Schötzau, Discontinuous Galerkin finite element method for the wave equation, Siam Journal on Numerical Analysis-SIAM J NUMER ANAL, vol. 44, no. 6, pp , [14] J. P. Bérenger, A Perfectly Matched Layer for the absorption of electromagnetic waves, Journal of Computational Physics, vol. 114, no. 2, pp , [15] F. Collino and C. Tsogka, Application of the PML absorbing layer model to the linear elastodynamic problem in anisotropic heterogeneous media, Geophysics, vol. 66, no. 1, pp , [16] J. A. Roden and S. D. Gedney, Convolution PML (CPML): an efficient FDTD implementation of the CFS-PML for arbitrary media, Microwave and Optical Technology Letters, vol. 27,no. 5, pp , [17] R. Martin, D. Komatitsch and S. D. Gedney, A variational formulation of a stabilized unsplit convolutional perfectly matched layer for the isotropic or anisotropic seismic wave equation, Computer Modeling in Engineering & Sciences, vol. 37, no. 3, pp , 2008b. [18] S. D. Gedney and B. Zhao, An auxiliary differential equation formulation for the complex-frequency shifted PML, IEEE Transactions on Antennas and Propagation, vol. 58, no. 3, pp , [19] R. Martin, D. Komatitsch, S. D. Gedney and E. Bruthiaux, A high-order time and space formulation of the unsplit perfectly matched layer for the seismic wave equation using Auxiliary Differential Equations (ADE-PML), Computer Modeling Engineering and Science, vol. 56, no. 1, pp.17-42, [20] K. C. Meza-Fajardo and A. S. Papageorgiou, A nonconvolutional, split-field, perfectly matched layer for wave propagation in isotropic and anisotropic elastic media: stability analysis, Bulletin of the Seismological Society of America, vol. 98, no. 4, pp , [21] L. Nyland, M. Harris and J. Prins, Fast N-body simulation with CUDA, GPU Gems 3, Chapter 31, pp , Addison-Wesley Professional, Boston, MA, USA, [22] D. Komatitsch, G. Erlebacher, D. Göddeke and D. Michéa, High-order finite-element seismic wave propagation modeling with MPI on a large-scale GPU cluster, Journal of Computer Physics, vol. 229, no. 20, [23] D. Komatitsch, D. Göddeke, G. Erlebacher and D. Michéa, Modeling the propagation of elastic waves using spectral elements on a cluster of 192 GPUs, Computer Scienc-Research and Development, vol.25, no. 1-1, pp , [24] R. Abdelkhalek, H. Calandra, O. Coulaud, J. Roman and G. Latu, Fast seismic modeling and reverse time migration on a GPU cluster, The 2009 High Performance Computing & Simulation, pp , [25] J. Zhou, D. Unat, D. J. Choi, C. C. Guest and Y. F. Cui, Hands-on Performance Tuning of 3D Finite Difference Earthquake Simulation on GPU Fermi Chipset, Procedia Computer Science, vol. 9, pp , [26] Y. Cui, K. B. Olsen, T.H.Jordan, K. Lee, J. Zhou, P. Small, D. Roten, G. Ely, D. Panda, A. Chourasia, J. Levesque, S. M. Day, and P. Maechling, Scalable Earthquake Simulation on Petascale Supercomputers, In Proceedings of the 2010 ACM/IEEE conference on Supercomputing, SC 10, pp. 1-20, Nov [27] A. Simone, and S. Hestholm, Instabilities in applying absorbing boundary conditions to high-order seismic modeling algorithms, Geophysics, vol. 63, pp , [28] S. Williams, A. Waterman, and D. Patterson. Roofline: an insightful visual performance model for multicore architectures. Communications of The ACM, vol. 52, no. 4, pp , [29] NVIDIA, NVIDIA CUDA Programming Guide, [30] O. Schenk, M. Christena and H. Burkharta, Algorithmic performance studies on graphics processing units, Journal of Parallel and Distributed Computing, vol. 68, pp , 2008.

Geophysical Journal International

Geophysical Journal International Geophysical Journal International Geophys. J. Int. (2010) 182, 389 402 doi: 10.1111/j.1365-246X.2010.04616.x Accelerating a three-dimensional finite-difference wave propagation code using GPU graphics

More information

Accelerating a 3D finite-difference wave propagation code using GPU graphics cards

Accelerating a 3D finite-difference wave propagation code using GPU graphics cards Accelerating a 3D finite-difference wave propagation code using GPU graphics cards David Michéa, Dimitri Komatitsch To cite this version: David Michéa, Dimitri Komatitsch. Accelerating a 3D finite-difference

More information

Spectral-element Simulations of Elastic Wave Propagation in Exploration and Geotechnical Applications

Spectral-element Simulations of Elastic Wave Propagation in Exploration and Geotechnical Applications Spectral-element Simulations of Elastic Wave Propagation in Exploration and Geotechnical Applications Lin Zheng*, Qi Zhao*, Qinya Liu, Bernd Milkereit, Giovanni Grasselli, University of Toronto, ON, Canada

More information

NSE 1.7. SEG/Houston 2005 Annual Meeting 1057

NSE 1.7. SEG/Houston 2005 Annual Meeting 1057 Finite-difference Modeling of High-frequency Rayleigh waves Yixian Xu, China University of Geosciences, Wuhan, 430074, China; Jianghai Xia, and Richard D. Miller, Kansas Geological Survey, The University

More information

NUMERICAL SIMULATION OF IRREGULAR SURFACE ACOUSTIC WAVE EQUATION BASED ON ORTHOGONAL BODY-FITTED GRIDS

NUMERICAL SIMULATION OF IRREGULAR SURFACE ACOUSTIC WAVE EQUATION BASED ON ORTHOGONAL BODY-FITTED GRIDS - 465 - NUMERICAL SIMULATION OF IRREGULAR SURFACE ACOUSTIC WAVE EQUATION BASED ON ORTHOGONAL BODY-FITTED GRIDS LIU, Z. Q. SUN, J. G. * SUN, H. * LIU, M. C. * GAO, Z. H. College for Geoexploration Science

More information

Using multifrontal hierarchically solver and HPC systems for 3D Helmholtz problem

Using multifrontal hierarchically solver and HPC systems for 3D Helmholtz problem Using multifrontal hierarchically solver and HPC systems for 3D Helmholtz problem Sergey Solovyev 1, Dmitry Vishnevsky 1, Hongwei Liu 2 Institute of Petroleum Geology and Geophysics SB RAS 1 EXPEC ARC,

More information

High-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs

High-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs High-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs Gordon Erlebacher Department of Scientific Computing Sept. 28, 2012 with Dimitri Komatitsch (Pau,France) David Michea

More information

Graphical Processing Units (GPU)-based modeling for Acoustic and Ultrasonic NDE

Graphical Processing Units (GPU)-based modeling for Acoustic and Ultrasonic NDE 18th World Conference on Nondestructive Testing, 16-20 April 2012, Durban, South Africa Graphical Processing Units (GPU)-based modeling for Acoustic and Ultrasonic NDE Nahas CHERUVALLYKUDY, Krishnan BALASUBRAMANIAM

More information

A Well-posed PML Absorbing Boundary Condition For 2D Acoustic Wave Equation

A Well-posed PML Absorbing Boundary Condition For 2D Acoustic Wave Equation A Well-posed PML Absorbing Boundary Condition For 2D Acoustic Wave Equation Min Zhou ABSTRACT An perfectly matched layers absorbing boundary condition (PML) with an unsplit field is derived for the acoustic

More information

3D Finite Difference Time-Domain Modeling of Acoustic Wave Propagation based on Domain Decomposition

3D Finite Difference Time-Domain Modeling of Acoustic Wave Propagation based on Domain Decomposition 3D Finite Difference Time-Domain Modeling of Acoustic Wave Propagation based on Domain Decomposition UMR Géosciences Azur CNRS-IRD-UNSA-OCA Villefranche-sur-mer Supervised by: Dr. Stéphane Operto Jade

More information

NUMERICAL MODELING OF ACOUSTIC WAVES IN 2D-FREQUENCY DOMAINS

NUMERICAL MODELING OF ACOUSTIC WAVES IN 2D-FREQUENCY DOMAINS Copyright 2013 by ABCM NUMERICAL MODELING OF ACOUSTIC WAVES IN 2D-FREQUENCY DOMAINS Márcio Filipe Ramos e Ramos Fluminense Federal University, Niterói, Brazil mfrruff@hotmail.com Gabriela Guerreiro Ferreira

More information

SEG/San Antonio 2007 Annual Meeting

SEG/San Antonio 2007 Annual Meeting Yaofeng He* and Ru-Shan Wu Modeling and Imaging Laboratory, Institute of Geophysics and Planetary Physics, University of California, Santa Cruz, CA, 95064, USA Summary A one-way and one-return boundary

More information

Multi-Domain Pattern. I. Problem. II. Driving Forces. III. Solution

Multi-Domain Pattern. I. Problem. II. Driving Forces. III. Solution Multi-Domain Pattern I. Problem The problem represents computations characterized by an underlying system of mathematical equations, often simulating behaviors of physical objects through discrete time

More information

We G Time and Frequency-domain FWI Implementations Based on Time Solver - Analysis of Computational Complexities

We G Time and Frequency-domain FWI Implementations Based on Time Solver - Analysis of Computational Complexities We G102 05 Time and Frequency-domain FWI Implementations Based on Time Solver - Analysis of Computational Complexities R. Brossier* (ISTerre, Universite Grenoble Alpes), B. Pajot (ISTerre, Universite Grenoble

More information

Tu P13 08 A MATLAB Package for Frequency Domain Modeling of Elastic Waves

Tu P13 08 A MATLAB Package for Frequency Domain Modeling of Elastic Waves Tu P13 8 A MATLAB Package for Frequency Domain Modeling of Elastic Waves E. Jamali Hondori* (Kyoto University), H. Mikada (Kyoto University), T.N. Goto (Kyoto University) & J. Takekawa (Kyoto University)

More information

Optimization solutions for the segmented sum algorithmic function

Optimization solutions for the segmented sum algorithmic function Optimization solutions for the segmented sum algorithmic function ALEXANDRU PÎRJAN Department of Informatics, Statistics and Mathematics Romanian-American University 1B, Expozitiei Blvd., district 1, code

More information

Implementation of the finite-difference method for solving Maxwell`s equations in MATLAB language on a GPU

Implementation of the finite-difference method for solving Maxwell`s equations in MATLAB language on a GPU Implementation of the finite-difference method for solving Maxwell`s equations in MATLAB language on a GPU 1 1 Samara National Research University, Moskovskoe Shosse 34, Samara, Russia, 443086 Abstract.

More information

Parallelising Pipelined Wavefront Computations on the GPU

Parallelising Pipelined Wavefront Computations on the GPU Parallelising Pipelined Wavefront Computations on the GPU S.J. Pennycook G.R. Mudalige, S.D. Hammond, and S.A. Jarvis. High Performance Systems Group Department of Computer Science University of Warwick

More information

Wavefield Analysis of Rayleigh Waves for Near-Surface Shear-Wave Velocity. Chong Zeng

Wavefield Analysis of Rayleigh Waves for Near-Surface Shear-Wave Velocity. Chong Zeng Wavefield Analysis of Rayleigh Waves for Near-Surface Shear-Wave Velocity By Chong Zeng Submitted to the graduate degree program in the Department of Geology and the Graduate Faculty of the University

More information

The GPU-based Parallel Calculation of Gravity and Magnetic Anomalies for 3D Arbitrary Bodies

The GPU-based Parallel Calculation of Gravity and Magnetic Anomalies for 3D Arbitrary Bodies Available online at www.sciencedirect.com Procedia Environmental Sciences 12 (212 ) 628 633 211 International Conference on Environmental Science and Engineering (ICESE 211) The GPU-based Parallel Calculation

More information

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono Introduction to CUDA Algoritmi e Calcolo Parallelo References q This set of slides is mainly based on: " CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory " Slide of Applied

More information

GPGPU LAB. Case study: Finite-Difference Time- Domain Method on CUDA

GPGPU LAB. Case study: Finite-Difference Time- Domain Method on CUDA GPGPU LAB Case study: Finite-Difference Time- Domain Method on CUDA Ana Balevic IPVS 1 Finite-Difference Time-Domain Method Numerical computation of solutions to partial differential equations Explicit

More information

Finite Difference Time Domain (FDTD) Simulations Using Graphics Processors

Finite Difference Time Domain (FDTD) Simulations Using Graphics Processors Finite Difference Time Domain (FDTD) Simulations Using Graphics Processors Samuel Adams and Jason Payne US Air Force Research Laboratory, Human Effectiveness Directorate (AFRL/HE), Brooks City-Base, TX

More information

ACCELERATING THE PRODUCTION OF SYNTHETIC SEISMOGRAMS BY A MULTICORE PROCESSOR CLUSTER WITH MULTIPLE GPUS

ACCELERATING THE PRODUCTION OF SYNTHETIC SEISMOGRAMS BY A MULTICORE PROCESSOR CLUSTER WITH MULTIPLE GPUS ACCELERATING THE PRODUCTION OF SYNTHETIC SEISMOGRAMS BY A MULTICORE PROCESSOR CLUSTER WITH MULTIPLE GPUS Ferdinando Alessi Annalisa Massini Roberto Basili INGV Introduction The simulation of wave propagation

More information

How to Optimize Geometric Multigrid Methods on GPUs

How to Optimize Geometric Multigrid Methods on GPUs How to Optimize Geometric Multigrid Methods on GPUs Markus Stürmer, Harald Köstler, Ulrich Rüde System Simulation Group University Erlangen March 31st 2011 at Copper Schedule motivation imaging in gradient

More information

Two-Phase flows on massively parallel multi-gpu clusters

Two-Phase flows on massively parallel multi-gpu clusters Two-Phase flows on massively parallel multi-gpu clusters Peter Zaspel Michael Griebel Institute for Numerical Simulation Rheinische Friedrich-Wilhelms-Universität Bonn Workshop Programming of Heterogeneous

More information

Analyzing the Performance of IWAVE on a Cluster using HPCToolkit

Analyzing the Performance of IWAVE on a Cluster using HPCToolkit Analyzing the Performance of IWAVE on a Cluster using HPCToolkit John Mellor-Crummey and Laksono Adhianto Department of Computer Science Rice University {johnmc,laksono}@rice.edu TRIP Meeting March 30,

More information

Numerical Simulation of Polarity Characteristics of Vector Elastic Wave of Advanced Detection in the Roadway

Numerical Simulation of Polarity Characteristics of Vector Elastic Wave of Advanced Detection in the Roadway Research Journal of Applied Sciences, Engineering and Technology 5(23): 5337-5344, 203 ISS: 2040-7459; e-iss: 2040-7467 Mawell Scientific Organization, 203 Submitted: July 3, 202 Accepted: September 7,

More information

Abstract. Introduction. Kevin Todisco

Abstract. Introduction. Kevin Todisco - Kevin Todisco Figure 1: A large scale example of the simulation. The leftmost image shows the beginning of the test case, and shows how the fluid refracts the environment around it. The middle image

More information

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CMPE655 - Multiple Processor Systems Fall 2015 Rochester Institute of Technology Contents What is GPGPU? What s the need? CUDA-Capable GPU Architecture

More information

Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency

Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency Yijie Huangfu and Wei Zhang Department of Electrical and Computer Engineering Virginia Commonwealth University {huangfuy2,wzhang4}@vcu.edu

More information

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono Introduction to CUDA Algoritmi e Calcolo Parallelo References This set of slides is mainly based on: CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory Slide of Applied

More information

OpenMP for next generation heterogeneous clusters

OpenMP for next generation heterogeneous clusters OpenMP for next generation heterogeneous clusters Jens Breitbart Research Group Programming Languages / Methodologies, Universität Kassel, jbreitbart@uni-kassel.de Abstract The last years have seen great

More information

GPU Programming Using NVIDIA CUDA

GPU Programming Using NVIDIA CUDA GPU Programming Using NVIDIA CUDA Siddhante Nangla 1, Professor Chetna Achar 2 1, 2 MET s Institute of Computer Science, Bandra Mumbai University Abstract: GPGPU or General-Purpose Computing on Graphics

More information

CO2 sequestration crosswell monitoring based upon spectral-element and adjoint methods

CO2 sequestration crosswell monitoring based upon spectral-element and adjoint methods CO2 sequestration crosswell monitoring based upon spectral-element and adjoint methods Christina Morency Department of Geosciences, Princeton University Collaborators: Jeroen Tromp & Yang Luo Computational

More information

Reverse time migration with random boundaries

Reverse time migration with random boundaries Reverse time migration with random boundaries Robert G. Clapp ABSTRACT Reading wavefield checkpoints from disk is quickly becoming the bottleneck in Reverse Time Migration. We eliminate the need to write

More information

2006: Short-Range Molecular Dynamics on GPU. San Jose, CA September 22, 2010 Peng Wang, NVIDIA

2006: Short-Range Molecular Dynamics on GPU. San Jose, CA September 22, 2010 Peng Wang, NVIDIA 2006: Short-Range Molecular Dynamics on GPU San Jose, CA September 22, 2010 Peng Wang, NVIDIA Overview The LAMMPS molecular dynamics (MD) code Cell-list generation and force calculation Algorithm & performance

More information

Optimizing Complex Spatially-Variant Coefficient Stencils For Seismic Modeling on GPU

Optimizing Complex Spatially-Variant Coefficient Stencils For Seismic Modeling on GPU Optimizing Complex Spatially-Variant Coefficient Stencils For Seismic Modeling on GPU Jiarui Fang, Haohuan Fu He Zhang, Wei Wu, Nanxun Dai, Lin Gan, Guangwen Yang Ministry of Education Key Lab. for Earth

More information

Optimised corrections for finite-difference modelling in two dimensions

Optimised corrections for finite-difference modelling in two dimensions Optimized corrections for 2D FD modelling Optimised corrections for finite-difference modelling in two dimensions Peter M. Manning and Gary F. Margrave ABSTRACT Finite-difference two-dimensional correction

More information

NUMERICAL MODELING OF STRONG GROUND MOTION USING 3D GEO-MODELS

NUMERICAL MODELING OF STRONG GROUND MOTION USING 3D GEO-MODELS th World Conference on Earthquake Engineering Vancouver, B.C., Canada August -6, 24 Paper No. 92 NUMERICAL MODELING OF STRONG GROUND MOTION USING D GEO-MODELS Peter KLINC, Enrico PRIOLO and Alessandro

More information

An Efficient 3D Elastic Full Waveform Inversion of Time-Lapse seismic data using Grid Injection Method

An Efficient 3D Elastic Full Waveform Inversion of Time-Lapse seismic data using Grid Injection Method 10 th Biennial International Conference & Exposition P 032 Summary An Efficient 3D Elastic Full Waveform Inversion of Time-Lapse seismic data using Grid Injection Method Dmitry Borisov (IPG Paris) and

More information

High Performance Computing on GPUs using NVIDIA CUDA

High Performance Computing on GPUs using NVIDIA CUDA High Performance Computing on GPUs using NVIDIA CUDA Slides include some material from GPGPU tutorial at SIGGRAPH2007: http://www.gpgpu.org/s2007 1 Outline Motivation Stream programming Simplified HW and

More information

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can

More information

Reverse-time migration imaging with/without multiples

Reverse-time migration imaging with/without multiples Reverse-time migration imaging with/without multiples Zaiming Jiang, John C. Bancroft, and Laurence R. Lines Imaging with/without multiples ABSTRACT One of the challenges with reverse-time migration based

More information

3D Helmholtz Krylov Solver Preconditioned by a Shifted Laplace Multigrid Method on Multi-GPUs

3D Helmholtz Krylov Solver Preconditioned by a Shifted Laplace Multigrid Method on Multi-GPUs 3D Helmholtz Krylov Solver Preconditioned by a Shifted Laplace Multigrid Method on Multi-GPUs H. Knibbe, C. W. Oosterlee, C. Vuik Abstract We are focusing on an iterative solver for the three-dimensional

More information

Accelerating Molecular Modeling Applications with Graphics Processors

Accelerating Molecular Modeling Applications with Graphics Processors Accelerating Molecular Modeling Applications with Graphics Processors John Stone Theoretical and Computational Biophysics Group University of Illinois at Urbana-Champaign Research/gpu/ SIAM Conference

More information

Reflectivity modeling for stratified anelastic media

Reflectivity modeling for stratified anelastic media Reflectivity modeling for stratified anelastic media Peng Cheng and Gary F. Margrave Reflectivity modeling ABSTRACT The reflectivity method is widely used for the computation of synthetic seismograms for

More information

Very fast simulation of nonlinear water waves in very large numerical wave tanks on affordable graphics cards

Very fast simulation of nonlinear water waves in very large numerical wave tanks on affordable graphics cards Very fast simulation of nonlinear water waves in very large numerical wave tanks on affordable graphics cards By Allan P. Engsig-Karup, Morten Gorm Madsen and Stefan L. Glimberg DTU Informatics Workshop

More information

high performance medical reconstruction using stream programming paradigms

high performance medical reconstruction using stream programming paradigms high performance medical reconstruction using stream programming paradigms This Paper describes the implementation and results of CT reconstruction using Filtered Back Projection on various stream programming

More information

HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes.

HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes. HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes Ian Glendinning Outline NVIDIA GPU cards CUDA & OpenCL Parallel Implementation

More information

Pyramid-shaped grid for elastic wave propagation Feng Chen * and Sheng Xu, CGGVeritas

Pyramid-shaped grid for elastic wave propagation Feng Chen * and Sheng Xu, CGGVeritas Feng Chen * and Sheng Xu, CGGVeritas Summary Elastic wave propagation is elemental to wave-equationbased migration and modeling. Conventional simulation of wave propagation is done on a grid of regular

More information

Performance Evaluations for Parallel Image Filter on Multi - Core Computer using Java Threads

Performance Evaluations for Parallel Image Filter on Multi - Core Computer using Java Threads Performance Evaluations for Parallel Image Filter on Multi - Core Computer using Java s Devrim Akgün Computer Engineering of Technology Faculty, Duzce University, Duzce,Turkey ABSTRACT Developing multi

More information

Optimizing Data Locality for Iterative Matrix Solvers on CUDA

Optimizing Data Locality for Iterative Matrix Solvers on CUDA Optimizing Data Locality for Iterative Matrix Solvers on CUDA Raymond Flagg, Jason Monk, Yifeng Zhu PhD., Bruce Segee PhD. Department of Electrical and Computer Engineering, University of Maine, Orono,

More information

We N Converted-phase Seismic Imaging - Amplitudebalancing Source-independent Imaging Conditions

We N Converted-phase Seismic Imaging - Amplitudebalancing Source-independent Imaging Conditions We N106 02 Converted-phase Seismic Imaging - Amplitudebalancing -independent Imaging Conditions A.H. Shabelansky* (Massachusetts Institute of Technology), A.E. Malcolm (Memorial University of Newfoundland)

More information

Using GPUs to compute the multilevel summation of electrostatic forces

Using GPUs to compute the multilevel summation of electrostatic forces Using GPUs to compute the multilevel summation of electrostatic forces David J. Hardy Theoretical and Computational Biophysics Group Beckman Institute for Advanced Science and Technology University of

More information

Implementation of PML in the Depth-oriented Extended Forward Modeling

Implementation of PML in the Depth-oriented Extended Forward Modeling Implementation of PML in the Depth-oriented Extended Forward Modeling Lei Fu, William W. Symes The Rice Inversion Project (TRIP) April 19, 2013 Lei Fu, William W. Symes (TRIP) PML in Extended modeling

More information

On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters

On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters 1 On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters N. P. Karunadasa & D. N. Ranasinghe University of Colombo School of Computing, Sri Lanka nishantha@opensource.lk, dnr@ucsc.cmb.ac.lk

More information

REDUCING BEAMFORMING CALCULATION TIME WITH GPU ACCELERATED ALGORITHMS

REDUCING BEAMFORMING CALCULATION TIME WITH GPU ACCELERATED ALGORITHMS BeBeC-2014-08 REDUCING BEAMFORMING CALCULATION TIME WITH GPU ACCELERATED ALGORITHMS Steffen Schmidt GFaI ev Volmerstraße 3, 12489, Berlin, Germany ABSTRACT Beamforming algorithms make high demands on the

More information

I. INTRODUCTION. Manisha N. Kella * 1 and Sohil Gadhiya2.

I. INTRODUCTION. Manisha N. Kella * 1 and Sohil Gadhiya2. 2018 IJSRSET Volume 4 Issue 4 Print ISSN: 2395-1990 Online ISSN : 2394-4099 Themed Section : Engineering and Technology A Survey on AES (Advanced Encryption Standard) and RSA Encryption-Decryption in CUDA

More information

A Wavelet-Based Method for Simulation of Seismic Wave Propagation

A Wavelet-Based Method for Simulation of Seismic Wave Propagation AGU Fall Meeting, A Wavelet-Based Method for Simulation of Seismic Wave Propagation Tae-Kyung Hong & B.L.N. Kennett Research School of Earth Sciences The Australian National University Abstract Seismic

More information

Optimization of HOM Couplers using Time Domain Schemes

Optimization of HOM Couplers using Time Domain Schemes Optimization of HOM Couplers using Time Domain Schemes Workshop on HOM Damping in Superconducting RF Cavities Carsten Potratz Universität Rostock October 11, 2010 10/11/2010 2009 UNIVERSITÄT ROSTOCK FAKULTÄT

More information

Timo Lähivaara, Tomi Huttunen, Simo-Pekka Simonaho University of Kuopio, Department of Physics P.O.Box 1627, FI-70211, Finland

Timo Lähivaara, Tomi Huttunen, Simo-Pekka Simonaho University of Kuopio, Department of Physics P.O.Box 1627, FI-70211, Finland Timo Lähivaara, Tomi Huttunen, Simo-Pekka Simonaho University of Kuopio, Department of Physics P.O.Box 627, FI-72, Finland timo.lahivaara@uku.fi INTRODUCTION The modeling of the acoustic wave fields often

More information

Optimization of finite-difference kernels on multi-core architectures for seismic applications

Optimization of finite-difference kernels on multi-core architectures for seismic applications Optimization of finite-difference kernels on multi-core architectures for seismic applications V. Etienne 1, T. Tonellot 1, K. Akbudak 2, H. Ltaief 2, S. Kortas 3, T. Malas 4, P. Thierry 4, D. Keyes 2

More information

RTM using effective boundary saving: A staggered grid GPU implementation a

RTM using effective boundary saving: A staggered grid GPU implementation a RTM using effective boundary saving: A staggered grid GPU implementation a a Published in Computers & Geosciences, 68, 64-72, (2014) Pengliang Yang, Jinghuai Gao, and Baoli Wang Xi an Jiaotong University,

More information

3D ADI Method for Fluid Simulation on Multiple GPUs. Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA

3D ADI Method for Fluid Simulation on Multiple GPUs. Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA 3D ADI Method for Fluid Simulation on Multiple GPUs Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA Introduction Fluid simulation using direct numerical methods Gives the most accurate result Requires

More information

1 Past Research and Achievements

1 Past Research and Achievements Parallel Mesh Generation and Adaptation using MAdLib T. K. Sheel MEMA, Universite Catholique de Louvain Batiment Euler, Louvain-La-Neuve, BELGIUM Email: tarun.sheel@uclouvain.be 1 Past Research and Achievements

More information

Radial Basis Function-Generated Finite Differences (RBF-FD): New Opportunities for Applications in Scientific Computing

Radial Basis Function-Generated Finite Differences (RBF-FD): New Opportunities for Applications in Scientific Computing Radial Basis Function-Generated Finite Differences (RBF-FD): New Opportunities for Applications in Scientific Computing Natasha Flyer National Center for Atmospheric Research Boulder, CO Meshes vs. Mesh-free

More information

GPU implementation of minimal dispersion recursive operators for reverse time migration

GPU implementation of minimal dispersion recursive operators for reverse time migration GPU implementation of minimal dispersion recursive operators for reverse time migration Allon Bartana*, Dan Kosloff, Brandon Warnell, Chris Connor, Jeff Codd and David Kessler, SeismicCity Inc. Paulius

More information

Finite-difference elastic modelling below a structured free surface

Finite-difference elastic modelling below a structured free surface FD modelling below a structured free surface Finite-difference elastic modelling below a structured free surface Peter M. Manning ABSTRACT This paper shows experiments using a unique method of implementing

More information

A Fast Speckle Reduction Algorithm based on GPU for Synthetic Aperture Sonar

A Fast Speckle Reduction Algorithm based on GPU for Synthetic Aperture Sonar Vol.137 (SUComS 016), pp.8-17 http://dx.doi.org/1457/astl.016.137.0 A Fast Speckle Reduction Algorithm based on GPU for Synthetic Aperture Sonar Xu Kui 1, Zhong Heping 1, Huang Pan 1 1 Naval Institute

More information

High performance FDTD algorithm for GPGPU supercomputers

High performance FDTD algorithm for GPGPU supercomputers Journal of Physics: Conference Series PAPER OPEN ACCESS High performance FDTD algorithm for GPGPU supercomputers To cite this article: Andrey Zakirov et al 206 J. Phys.: Conf. Ser. 759 0200 Related content

More information

Interaction of Fluid Simulation Based on PhysX Physics Engine. Huibai Wang, Jianfei Wan, Fengquan Zhang

Interaction of Fluid Simulation Based on PhysX Physics Engine. Huibai Wang, Jianfei Wan, Fengquan Zhang 4th International Conference on Sensors, Measurement and Intelligent Materials (ICSMIM 2015) Interaction of Fluid Simulation Based on PhysX Physics Engine Huibai Wang, Jianfei Wan, Fengquan Zhang College

More information

Performance Analysis and Optimization of Gyrokinetic Torodial Code on TH-1A Supercomputer

Performance Analysis and Optimization of Gyrokinetic Torodial Code on TH-1A Supercomputer Performance Analysis and Optimization of Gyrokinetic Torodial Code on TH-1A Supercomputer Xiaoqian Zhu 1,2, Xin Liu 1, Xiangfei Meng 2, Jinghua Feng 2 1 School of Computer, National University of Defense

More information

B. Tech. Project Second Stage Report on

B. Tech. Project Second Stage Report on B. Tech. Project Second Stage Report on GPU Based Active Contours Submitted by Sumit Shekhar (05007028) Under the guidance of Prof Subhasis Chaudhuri Table of Contents 1. Introduction... 1 1.1 Graphic

More information

The determination of the correct

The determination of the correct SPECIAL High-performance SECTION: H i gh-performance computing computing MARK NOBLE, Mines ParisTech PHILIPPE THIERRY, Intel CEDRIC TAILLANDIER, CGGVeritas (formerly Mines ParisTech) HENRI CALANDRA, Total

More information

Shallow Water Simulations on Graphics Hardware

Shallow Water Simulations on Graphics Hardware Shallow Water Simulations on Graphics Hardware Ph.D. Thesis Presentation 2014-06-27 Martin Lilleeng Sætra Outline Introduction Parallel Computing and the GPU Simulating Shallow Water Flow Topics of Thesis

More information

To Use or Not to Use: CPUs Cache Optimization Techniques on GPGPUs

To Use or Not to Use: CPUs Cache Optimization Techniques on GPGPUs To Use or Not to Use: CPUs Optimization Techniques on GPGPUs D.R.V.L.B. Thambawita Department of Computer Science and Technology Uva Wellassa University Badulla, Sri Lanka Email: vlbthambawita@gmail.com

More information

Software and Performance Engineering for numerical codes on GPU clusters

Software and Performance Engineering for numerical codes on GPU clusters Software and Performance Engineering for numerical codes on GPU clusters H. Köstler International Workshop of GPU Solutions to Multiscale Problems in Science and Engineering Harbin, China 28.7.2010 2 3

More information

Direct Numerical Simulation of Turbulent Boundary Layers at High Reynolds Numbers.

Direct Numerical Simulation of Turbulent Boundary Layers at High Reynolds Numbers. Direct Numerical Simulation of Turbulent Boundary Layers at High Reynolds Numbers. G. Borrell, J.A. Sillero and J. Jiménez, Corresponding author: guillem@torroja.dmt.upm.es School of Aeronautics, Universidad

More information

Accelerating CFD with Graphics Hardware

Accelerating CFD with Graphics Hardware Accelerating CFD with Graphics Hardware Graham Pullan (Whittle Laboratory, Cambridge University) 16 March 2009 Today Motivation CPUs and GPUs Programming NVIDIA GPUs with CUDA Application to turbomachinery

More information

Challenges in large-scale graph processing on HPC platforms and the Graph500 benchmark. by Nkemdirim Dockery

Challenges in large-scale graph processing on HPC platforms and the Graph500 benchmark. by Nkemdirim Dockery Challenges in large-scale graph processing on HPC platforms and the Graph500 benchmark by Nkemdirim Dockery High Performance Computing Workloads Core-memory sized Floating point intensive Well-structured

More information

GPU Computing: Development and Analysis. Part 1. Anton Wijs Muhammad Osama. Marieke Huisman Sebastiaan Joosten

GPU Computing: Development and Analysis. Part 1. Anton Wijs Muhammad Osama. Marieke Huisman Sebastiaan Joosten GPU Computing: Development and Analysis Part 1 Anton Wijs Muhammad Osama Marieke Huisman Sebastiaan Joosten NLeSC GPU Course Rob van Nieuwpoort & Ben van Werkhoven Who are we? Anton Wijs Assistant professor,

More information

Sponge boundary condition for frequency-domain modeling

Sponge boundary condition for frequency-domain modeling GEOPHYSIS, VOL. 60, NO. 6 (NOVEMBER-DEEMBER 1995); P. 1870-1874, 6 FIGS. Sponge boundary condition for frequency-domain modeling hangsoo Shin ABSTRAT Several techniques have been developed to get rid of

More information

Applications of Berkeley s Dwarfs on Nvidia GPUs

Applications of Berkeley s Dwarfs on Nvidia GPUs Applications of Berkeley s Dwarfs on Nvidia GPUs Seminar: Topics in High-Performance and Scientific Computing Team N2: Yang Zhang, Haiqing Wang 05.02.2015 Overview CUDA The Dwarfs Dynamic Programming Sparse

More information

Fast Multipole and Related Algorithms

Fast Multipole and Related Algorithms Fast Multipole and Related Algorithms Ramani Duraiswami University of Maryland, College Park http://www.umiacs.umd.edu/~ramani Joint work with Nail A. Gumerov Efficiency by exploiting symmetry and A general

More information

arxiv: v1 [math.na] 26 Jun 2014

arxiv: v1 [math.na] 26 Jun 2014 for spectrally accurate wave propagation Vladimir Druskin, Alexander V. Mamonov and Mikhail Zaslavsky, Schlumberger arxiv:406.6923v [math.na] 26 Jun 204 SUMMARY We develop a method for numerical time-domain

More information

Asynchronous OpenCL/MPI numerical simulations of conservation laws

Asynchronous OpenCL/MPI numerical simulations of conservation laws Asynchronous OpenCL/MPI numerical simulations of conservation laws Philippe HELLUY 1,3, Thomas STRUB 2. 1 IRMA, Université de Strasbourg, 2 AxesSim, 3 Inria Tonus, France IWOCL 2015, Stanford Conservation

More information

Center for Computational Science

Center for Computational Science Center for Computational Science Toward GPU-accelerated meshfree fluids simulation using the fast multipole method Lorena A Barba Boston University Department of Mechanical Engineering with: Felipe Cruz,

More information

Fast Multipole Method on the GPU

Fast Multipole Method on the GPU Fast Multipole Method on the GPU with application to the Adaptive Vortex Method University of Bristol, Bristol, United Kingdom. 1 Introduction Particle methods Highly parallel Computational intensive Numerical

More information

Development of a Maxwell Equation Solver for Application to Two Fluid Plasma Models. C. Aberle, A. Hakim, and U. Shumlak

Development of a Maxwell Equation Solver for Application to Two Fluid Plasma Models. C. Aberle, A. Hakim, and U. Shumlak Development of a Maxwell Equation Solver for Application to Two Fluid Plasma Models C. Aberle, A. Hakim, and U. Shumlak Aerospace and Astronautics University of Washington, Seattle American Physical Society

More information

Performance potential for simulating spin models on GPU

Performance potential for simulating spin models on GPU Performance potential for simulating spin models on GPU Martin Weigel Institut für Physik, Johannes-Gutenberg-Universität Mainz, Germany 11th International NTZ-Workshop on New Developments in Computational

More information

MD-CUDA. Presented by Wes Toland Syed Nabeel

MD-CUDA. Presented by Wes Toland Syed Nabeel MD-CUDA Presented by Wes Toland Syed Nabeel 1 Outline Objectives Project Organization CPU GPU GPGPU CUDA N-body problem MD on CUDA Evaluation Future Work 2 Objectives Understand molecular dynamics (MD)

More information

Seismic modelling with the reflectivity method

Seismic modelling with the reflectivity method Yongwang Ma, Luiz Loures, and Gary F. Margrave ABSTRACT Seismic modelling with the reflectivity method Numerical seismic modelling is a powerful tool in seismic imaging, interpretation and inversion. Wave

More information

SPECFEM3D_GLOBE. Mostly developed at Caltech (USA) and University of Pau (France) History: v1.0: 1999/2000 ; v3.6: 2005; v4.

SPECFEM3D_GLOBE. Mostly developed at Caltech (USA) and University of Pau (France) History: v1.0: 1999/2000 ; v3.6: 2005; v4. SPECFEM3D_GLOBE Min Chen Vala Hjörleifsdóttir Sue Kientz Dimitri Komatitsch Qinya Liu Alessia Maggi David Michéa Brian Savage Bernhard Schuberth Leif Strand Carl Tape Jeroen Tromp The SPECFEM3D source

More information

Accelerated Load Balancing of Unstructured Meshes

Accelerated Load Balancing of Unstructured Meshes Accelerated Load Balancing of Unstructured Meshes Gerrett Diamond, Lucas Davis, and Cameron W. Smith Abstract Unstructured mesh applications running on large, parallel, distributed memory systems require

More information

University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory

University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory Locality Optimization of Stencil Applications using Data Dependency Graphs

More information

PhD Student. Associate Professor, Co-Director, Center for Computational Earth and Environmental Science. Abdulrahman Manea.

PhD Student. Associate Professor, Co-Director, Center for Computational Earth and Environmental Science. Abdulrahman Manea. Abdulrahman Manea PhD Student Hamdi Tchelepi Associate Professor, Co-Director, Center for Computational Earth and Environmental Science Energy Resources Engineering Department School of Earth Sciences

More information

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand

More information

Accelerating GPU computation through mixed-precision methods. Michael Clark Harvard-Smithsonian Center for Astrophysics Harvard University

Accelerating GPU computation through mixed-precision methods. Michael Clark Harvard-Smithsonian Center for Astrophysics Harvard University Accelerating GPU computation through mixed-precision methods Michael Clark Harvard-Smithsonian Center for Astrophysics Harvard University Outline Motivation Truncated Precision using CUDA Solving Linear

More information

P052 3D Modeling of Acoustic Green's Function in Layered Media with Diffracting Edges

P052 3D Modeling of Acoustic Green's Function in Layered Media with Diffracting Edges P052 3D Modeling of Acoustic Green's Function in Layered Media with Diffracting Edges M. Ayzenberg* (Norwegian University of Science and Technology, A. Aizenberg (Institute of Petroleum Geology and Geophysics,

More information