GPGPU-Aided 3D Staggered-grid Finite-difference Seismic Wave Modeling
|
|
- Elvin Sharp
- 5 years ago
- Views:
Transcription
1 GPGPU-Aided 3D Staggered-grid Finite-difference Seismic Wave Modeling Chang Cai 1, Haiqing Chen 2, Ze Deng 1, Dan Chen 1, *, Samee U. Khan 3, Ke Zeng 1, Minxiao Wu 1 1 School of Computer Science China University of Geosciences Wuhan, China, Postcode: School of Optoelectronics Hankou University Wuhan, China, Postcode: Department of Electrical and Computer Engineering North Dakota State University North Dakota, US * Corresponding author s chendan@pmail.ntu.edu.sg Abstract Finite difference is a simple, fast and effective numerical method for seismic wave modeling, and has been widely used in forward waveform inversion and reverse time migration. However, intensive calculation of three-dimensional seismic forward modeling has been restricting the industrial application of 3D pre-stack reverse time migration and inversion. Aiming at this problem, in this paper, a parallelized 3D Staggered-grid Finite-difference has been developed using General-purpose computing on the graphics processing unit (GPGPU), namely G-3DFD, since the emergence of graphic processing units (GPU) as an effective alternative to traditional general purpose processors has become increasingly capable in accelerating large-scale scientific computing. We analyze three-dimensional staggered grid finite difference method for the implementation on GPU, making possible the industrial application of 3D pre-stack reverse time migration and inversion. Experiments show that G-3DFD has dramatically improved the runtime performance 88 times on modern GPGPU platforms comparing to the original CPU implementation methods. Keywords: Seismic Wave Modeling; 3D Stencil Computation; GPGPU; Parallel Computing I. INTRODUCTION Finite-difference (FD) techniques in the time domain (FDTD) are widely used to solve wave equations such as Maxwell s equations [1] or the seismic wave equation [2] and have been used to solve Navier-Stokes equations [3] as well. A thorough review on FD applied to the seismic wave equation is described in [4]. When more geometrical flexibility is needed for instance to handle geometrically complex models other techniques such as a pseudo-spectral technique [5][6], a boundary-element method [7][8], a spectral-element method [11] or a discontinuous Galerkin technique [12][13] is needed because of its simplicity. FD is often used in conjunction with a perfectly matched layer (PML) to absorb waves on the artificial edges of the numerical grid to mimic an infinite or semi-infinite medium [14] or a Convolution PML (CPML) can be used to improve the behavior of the discrete PML for waves impinging the artificial edges of the grid at grazing incidence [16][17] for unspotted CPMLs for seismic waves. More recently a formulation of the unspotted CPML that can easily be extended to higher-order time schemes, called the auxiliary differential equation PML (ADE-PML), has been introduced by [18] for Maxwell s equations and by [19] for the seismic wave equation. An improved sponge layer, improperly called the split Multiaxial PML (M-PML), has been suggested by [20], but the perfectly matched character of Bérenger is lost because of the coupling introduced between derivatives along multiple grid axes. In recent years, graphics processing unit (GPU) has been widely used to accelerate general-purpose applications. A number of physical computing problems have been solved using GPU, e.g. molecular dynamics simulations, fluid dynamics simulations and astrophysical calculations [21], etc. Regarding FD, several applications have been ported to GPUs. Some spectral-element methods are proposed on a single GPU platforms in [6] while some methods are based on multiple GPUs in [22][23]. GPU programming on NVIDIA graphics cards has become significantly easy with the introduction with CUDA programming language, which is relatively easy to learn because its syntax is similar to C. Regarding FD for seismic reverse time migration in the case of an acoustic medium with constant density, Abdelkhalek and Micikevicius (from NVIDIA corporation, the developers of GPU hardware and of CUDA) have recently introduced optimized implementations. Abdelkhalek extended it to the acoustic case with heterogeneous density [24]. In this study, we use CUDA to solve the 3D Staggered-grid Finite-difference Seismic Wave Modeling in a more complex fully heterogeneous environment in parallel. We propose some new techniques to adapt to the programming model of CUDA, including kernels, thread hierarchy and memory hierarchy. Since the number of points is very huge, we use one thread to process points in a row, this method not only solves the problem of limited resource on GPU, but also easily achieves
2 the optimization of Memory Coalesced to decrease the access times of global memory and fully exploit limited shared memory in GPU hardware. GPU has much more cores and greater compute capability than CPU, so in our program, all calculation works are run on GPU while the memory management and result output are executed on CPU. Then we finish the 3D Staggered-grid Finite-difference Seismic Wave Modeling respectively using C running on CPU and CUDA program running on GPU. Performance comparisons between them show that GPU is greatly suitable for parallelism computation especially for highly compute-intensive issue and the speedup we get ranges from 57 to 88, which reduces the work time from one hour to half a minute and makes the industrial application of 3D pre-stack reverse time migration and inversion possible. Step 2: In processing the tth time, differential of the stress along the spatial direction is computed, update values of velocities along the surface. Step 3: Surface differential of the velocity field along the axis direction is calculated, update values of stress along the surface. Step 4: Check if t>max: when termination condition is not satisfied, Step 4 will be returned with t=t+1 and add the wavelet source into simulation to Step 2; Otherwise, jump to Step Start Initiate II. METHODS t=t+1 Differential of the stress along the spatial direction A. 3D Staggered-grid Finide-difference This section describes the formulation of the AWP-ODC numerical model and the analysis of the computation steps. As stated in paper [25], AWP-ODC solves a 3D velocity-stress wave equation using a staggered-grid finite difference method to achieve fourth-order accurate in space and second-order accurate in time. The coupled system of partial differential equations includes the particle velocity vector ν and the symmetric stress tensor σ [26]. Suppose: σ σ σ σ σ σ σ (1) σ σ σ Then the governing elastodynamic equations are [45]: σ (2) σ (3) Where and are the lame coefficient and is the constant density. Simplifying formula (2) and (3) leads to three scalar-valued equations for velocity vector components and three scalar-valued equations for the stress tensor components. Three equations are listed below including one for velocity vector equation and two for stress equations: σ σ σ (4) σ σ AWP-ODC is a memory-intensive application and twenty-one 3D variable arrays are involved in the main computation loop. In addition to the three velocity vector components and six symmetric stress tensor components, 6 temporary variables (r1, r2, r3, r4, r5, r6) and 6 constant coefficients (λ, μ, ρ, S for quality factor wave Qs and P wave Qp, boundary condition variable Cerjan Cj [27]) are used in the numerical modeling. Fig.1 presents the pseudo-code of the computation kernels in the main loop for formula (4) and (5). The algorithm mainly includes 5 steps: Step 1: This step initializes the original model's parameters, sets the 3D grid's size and the time step and calculates the wavelet for simulation. (5) (6) 3 Add wavelet source 5 4 N Velocity Compution surface differential of the velocity field along the axis direction Strss Compution t>tmax? N OutPut Result Fig. 1. AWP-ODC algorithm Step 5: Output the results of simulation. The 13 point asymmetric stencil computation for v x in AWP-ODC main loops reads occur 12 times in three different 3D arrays and writes occur only once in one 3D array. Stress component calculation includes two 1D asymmetric stencils, with only 4 reads from a single 3D array and no writes. TABLE I. summarizes the analysis of kernels in Fig.1. It shows that AWP-ODC is a memory-bound application because of the low FLOPS to bytes ratio (the average operation intensity is around 0.5), which means the application has poor temporal locality and the performance is dominated by the memory system or arithmetic throughput [28]. TABLE I. ANALYSIS OF COMPUTATION KERNELS IN AWP-ODC MAIN LOOP Kernels Reads Writes FLOPS FLOPS/Bytes Velocity Computation Kernel Stress Computation Kernel Total End
3 A. General-purpose Computing on Graphics Processing Units (GPGPU) and Compute Unified Device Architecture (CUDA) Graphics processing unit (GPU) is specialized for compute-intensive and highly parallel computations [29], especially those problems that can be expressed as data-parallel computations (e.g., the Matrix multiplication). Owing to their highly-parallel architecture, modern GPUs are capable of a theoretical peak performance that is an order of magnitude higher than mainstream CPUs [30]. This feature together with the high performance-to-price ratio and widespread availability have propelled GPUs to the forefront of high-performance computing which are now accessible even on most commodity computers. General-purpose Computing on Graphics Processing Units (GPGPU) has recently boomed with the enhanced programmability of GPUs rather than having to depend on standard graphics APIs or shadier language, which forces the developer to re-formulate algorithms in graphics metaphors, general purpose programming languages. Grid x Grid y Block (0, 0) Block (n, 0) Block (0, m) Block (n, m) Block (n, m) Thread (0, 0) Thread (i, 0) Thread (0, j) Thread (i, j) Global Memory for all Grids Per-block Shared Memory block (n, m) Per-thread Local Memory for thread (i, j) Fig. 2. Hierarchy of Threads and Memory in CUDA Relatively high abstraction levels have emerged, and NVIDIA s Compute Unified Device Architecture (CUDA) is a salient example. The current CUDA fosters a software environment which allows programming on GPU with a slightly extended version of C, therefore a CUDA-based application can execute on the GPU in a massively parallel fashion. A CUDA application defines C functions as kernels to be explicitly executed on the GPU (referred to as the device ). Normally the kernels are invoked from the main executable of the application on the CPU (the host ), which will be executed for N times via N individual CUDA threads. As illustrated in Fig. 2, CUDA threads are organized in one to three-dimensional manner, where threads are grouped to blocks and blocks are again grouped into grids. Threads in a block can cooperate together by efficiently sharing data through a fast shared memory and synchronizing their execution to coordinate memory access. Threads, memories and synchronization are exposed to developers for very fine-grained data and thread parallelisms, e.g., dealing with a single variable of float type per thread. Application developers should partition the problem into a set of fine-grained subtasks which can be mapped to threads accordingly. A more detailed introduction to GPGPU and CUDA can be seen in [29]. III. PARALLELIZATION A. GPGPU-aided 3D Staggered-grid Finide-difference This study focus on a GPGPU-aided implementation of the 3D Staggered-grid Finite-difference method. The main difficulty when implementing a finite-difference code on a GPU comes from the highly compute-intensive. For a fourth-order spatial operator, the thread that handles the calculation of point (i, j, k) needs to know the fields (and therefore access the arrays) at points: (i, j, k), (i + 1, j, k), (i + 2, j, k), (i 1, j, k), (i 2, j, k), (i, j + 1, k) and so on, including the point that it handles and its neighboring points. This implies that 13 accesses to global memory are needed on the GPU to handle each grid point, which causes a high access overhead due to the longer access delay of global memory. However threads that belong to the same block can access data in faster shared memory, it is possible to significantly reduce this number of memory accesses per grid point and thus drastically improve performance. Since the number of threads and the size of shared memory per block run at the same time are limited, and the number of threads points is growing with an order of O(n 3 ) as the growth of grid size, there is not enough shared memory to allow one to use 3-D blocks large enough to sufficiently reduce this ratio, and therefore this approach cannot be efficiently implemented, so we have to split the 3D model to 2D approach. We therefore turn to a more efficient 2-D approach, which increases the computation workload of each thread to reduce the number of threads. We can parallelize the 3D Staggered-grid Finite-difference application at the grid level. As indicated in Fig.3, each grid point of 3D Staggered-grid Finite-difference is mapped to an individual task, but the number of points is too big and the workload of each point is too small, so one row data point mapping to one thread task is the right design. And also we design the program accessing the global memory in coalesced way. One row data unit mapping to one thread task Size of Grid Point A point (Data unit) Size of Grid Point Fig. 3. Parallelization at threads level, one row data mapping to one thread task Full utilization of on-chip memory is the key to achieve high performance for GPU-based applications. Our optimization mechanisms can be summarized into four steps to achieve this goal: Step 1: Read-Only Memory Cache: Texture and Constant memory in the GPU device has its own cache. As discussed
4 above, six 3D variable arrays hold constant coefficients. We put them into the texture memory to take advantage of the texture cache. All scalar constants are put into the constant memory to save registers. The other 15 arrays are stored in global memory because their values are updated during iteration. Step 2: 2D Domain Decomposition: CUDA is an extended language of C/C++, so memory storage for 3D arrays will be fast along the z axis while be slow along x axis. To obtain a better cache hit rate and allow all threads in the same wrap to access data along the fast axis z instead of the slow axis x, we decompose the 3D Grid only along axis y and axis z. Each thread will calculate the entire NX for a given 2D (y, z) location as shown in Fig.4. Since the thread block is also a 3-dimension structure and presents the best performance in the x direction, threads in the x direction must correspond to the 3D Grid z direction, while threads in the y direction correspond to the 3D Grid y direction. Since 32 threads in a wrap are executed at the same time, the number of threads in the x direction is considered to be a multiple of 32 for better performance. Threads(tx,ty,1) Ny=NY/ty Nz=NX/tx NZ Step 1: Initialization phase. This step sets the size of grid and defines the boundaries of seismic wave model. Thus we can obtain the size of each parameter, and allocate the exact size of GPU memory for next step's computation. Step 2: After initialization, this step deals with the computation on GPU, a two-dimensional thread block is defined to identify the CUDA threads. After that, the algorithm focus on PML boundary set, Wave field variable initialization, the source sequence generation START Initiat(nMax,Vx,Vy,Vz,P,Bx,By,Bz,Pla, Rho,Source,Eponge) Beginning the computation on GPU: the algorithm focus on PML boundary set, Wave field variable initialization, the source sequence generation. Wavelet<<<block_size,thread_size>>> (nmax,dt,source,lwav); MODEL(RLA,RHO,BX,BY,BZ,DTDX,NX,NY,NZ); Seismic Wave Modeling on GPU Momentum<<<GridSize,BlockSize>>> (vx,vy,vz,bx,by,bz,p); Loose_ends_k<<<GridSize,BlockSize>>> (vx,vy,bx,by,p); Loose_ends_j<<<GridSize,BlockSize>>> (vx,vy,bx,by,p); Loose_ends_i<<<GridSize,BlockSize>>> (vx,vy,bx,by,p); Parallel Tasks Parallel Tasks Results Device end, independent from the host In parallel Thread block with a unique ID associating with each thread Ny Nx Chunk(NX,Ny,Nz)for GPU Blocks NX 3D Grid (NX,NY,NZ) NY A point (Data unit) Fig. 4. Decomposition for GPU Kernels Step 3: Global Memory Coalesced: For maximum performance, memory access need to be coalesced in global memory. CUDA provides two functions cudamallocpitch and cudapitchedptr to help ensure aligned memory access. However, instead of using these library functions, we manually pad zeros onto the boundaries in the z axis of the 3D grid to align memory to the inner region. The padding size plus NZ should also be a multiple of 32, and the two layers of ghost cells are included in the padding. Step 4: Memory Management: GPU memory is allocated and managed by CPU. Memory optimizations are the most important area for performance optimization. Minimizing data transfer between the host and the device is an effective way. Our method is to batch multiple small transfers into one transfer to significantly reduce the communication costs between GPU and CPU. Since GPU has limited memory resources, we design the program with only one transfer from CPU to GPU during the whole computation procedure and reuse the GPU memory to make the program calculate much larger 3D seismic wave modeling than original methods. Since GPU has more cores and compute faster than CPU, all calculations are done simultaneously by GPU except memory copy, see Fig.5 for more details: END Fig. 5. The flow of the GPGPU-Aided 3D Staggered-grid Finite-difference Step 3: This step calculates the Seismic Wave Modeling on GPU to speed up the calculation of wave field, including space differential calculation, matrix or vector product and sum. Meanwhile, in order to save and map results, the needed results from GPU must be copied back to the CPU. Of course, we can use GPU rendering technology directly to the output wavelength snapshot as well. IV. EXPERIMENT AND PERFORMANCE COMPARISONS The new CUDA method implementation has been carefully validated for correctness. We compare our version of GPU with the traditional method base on CPU. The results are shown in Fig.6. The first 200 time steps of the stress of two methods are recorded at the source point. It can be seen that the results are almost identical with tiny differences caused by different programming languages and compilers. 1 x x 10 4 Fig. 6. Accuracy comparison of waveforms computed on GPU and CPU On a single GTX680 card with memory of 2GB, in order to compute as large size as possible, we use a simple model. It has nine 3D grid parameters. By trying to allocate increasingly
5 larger chunks of memory, almost 1.9 GB memory is available to the user. The largest 3D model that we can use has a size of 37.6km 37.6km 37.6km discretized using a grid of points, that is with grid cells of size 100m in the three spatial directions. To test the performance of the CUDA code running on GPU, we compare it with the performance of the C code running on CPU. The GPU card we use is NVIDIA GTX 680 and the CPU is Intel(R) Core(TM) i We compile the C code in VS2008, and the code for the GPU with the standard NVIDIA nvcc compiler of CUDA version 5.0. To investigate the impacts of different factors on the proposed method, we use a few different measurements on CPU and GTX680 card. TABLE II. is the performance comparisons between GPU and CPU, all benchmark experiments run for 200 time steps and the measurement is focused on the whole time used during the modeling. From TABLE II. we can observe that: 1) With the growing of grid size, the amount of computation is on exponential growth. The time used by CPU to calculate the size larger than 370*370*370 is more than an hour. It's a big challenge for us to compute large 3D gird Seismic Wave Modeling on CPU. 2) GPU is good at fine-grained computation especially matrix operations. It only costs half a minute for GPU to compute 368*368*368 3D grid, which is 88-fold accelerated than CPU. 3) Even though GPU can provide an amazing speedup, the resource on GPU is limited, so the size we can calculate and the speedup we can get depend on heavily GPU resources. TABLE II. shows that when the grid size is less than 368*368*368, the speedup is increasing with the size growth, but when the 3D size is 376*376*376 the speedup only reaches 84, the reason is that the limited GPU resources has been used up. TABLE II. PERFORMANCE COMPARISON BETWEEN CPU AND GPU IN 200 TIME STEPS Grid Size Time(s) CPU(i7) GPU Speedup 128*128* *192* *256* *368* *376* From the analysis above, under the premise of correctness, a single GPU is able to bring more than 80 times speedup, which gives us a perfect alternative for solving highly compute-intensive algorithm and problem. V. CONCLUSIONS AND FUTURE WORK In this study, the feasibility and efficiency of 3D Staggered-grid Finite-difference for seismic wave modeling approach was explored. Finite difference is a simple, fast and effective numerical method for seismic wave modeling, and has been widely used in forward waveform inversion and reverse time migration. However, intensive calculation of three-dimensional seismic forward modeling has been restricting the industrial application of 3D pre-stack reverse time migration and inversion. In order to satisfy the intensive calculation of three-dimensional seismic forward modeling, this study proposes an approach which uses CUDA with thousands of threads, the complex matrix operations can be divided into simple and easy calculations which map the thread's structure exactly, i.e., each thread only calculates one element of the result matrix. With several optimization technologies used, we can get much more obvious promotion. Experiments have been carried out to evaluate the performance of the parallel variants of 3D Staggered-grid Finite-difference using examples of different size simulation. We have shown that the GPU code is accurate by comparing our results to results obtained running the same simulation on a CPU core. We have accelerated a 3-D finite-difference wave propagation code by a factor of from 57 to 88 compared to a serial implementation using one NVIDIA GPU graphics cards and the CUDA programming language, which reduces the simulation time from about one hour into half a minute. As the size growing even larger, more efficient methods must be taken into account, such as multi-gpu program or redesign algorithm, which one is better, our future work will focus on this. ACKNOWLEDGMENT This work was sponsored in part by the Hundred University Talent of Creative Research Excellence Programme (Hebei, China), the National Natural Science Foundation of China (grants No ), the Programme of High-Resolution Earth Observing System (China), the Fundamental Research Funds for the Central Universities (No.CUGL100608, No.CUGL and G , China University of Geosciences, Wuhan), Specialized Research Fund for the Doctoral Program of Higher Education ( ), and the Program for New Century Excellent Talents in University (grant No. NCET ) REFERENCES [1] K.S. Yee, Numerical solution of initial boundary value problems involving Maxwell s equations, IEEE Transactions on Antennas and Propagation, vol. 14, no. 3, pp , [2] J. Virieux, P-SV wave propagation in heterogeneous media: velocity stress finite-difference method, Geophysics, vol. 51, no. 4, pp , 1986 [3] A. J. Chorin, Numerical solution of the Navier-Stokes equations, American Mathematical Society, vol. 22, no. 104, pp , [4] P. Moczo and J. Robertsson, L. Eisner, The finite-difference time-domain method for modeling of seismic wave propagation, Advancesin in Geophysics, vol. 48, no. 8, pp , [5] J. M. Carcione and P. J. Wang, A Chebyshev collocation method for the wave equation in generalized coordinates, Computers and Fluid Dynamics Journal, no. 2, pp , [6] D. Komatitsch, F. Coutel and P.Mora, Tensorial formulation of the wave equation for modelling curved interfaces, Geophysical Journal International, vol. 127, no. 1, pp , [7] H. Kawase, Time-domain response of a semi-circular canyon for incident SV, P and Rayleigh waves calculated by the discrete wavenumber boundary element method, Bulletin of the Seismological Society of America, vol. 78, pp , [8] R. Vai, J. M. Castillo-Covarrubias, F. J. Sánchez-Sesma, D. Komatitsch and J. P. Vilotte, Elastic wave propagation in an
6 irregularly layered medium, Soil Dynamics and Earthquake Engineering, vol. 18, no. 1, pp , [9] Q. Liu, J. Polet, D.Komatitsch and J. Tromp, Spectral-element moment tensor inversions for earthquakes in Southern California, Bulletin of The Seismological Society of America-BULL SEISMOL SOC AMER, vol. 94, no. 5, pp , [10] E. Chaljub, D. Komatitsch, J. P. Vilotte, Y. Capdeville, B. Valette and G. Festa, Spectral element analysis in seismology, Advances in Wave Propagation in Heterogeneous Media, vol. 48, pp , [11] J. Tromp, D. Komatitsch and Q. Liu, Spectral-element and adjoint methods in seismology, Communications in Computational Physics, vol. 3, no. 1, pp. 1-32, [12] W. H. Reed and T. R. Hill, Triangular mesh methods for the neutron transport equation, Alamos Scientific Laboratory, Los Alamos, USA, [13] M. J. Grote, A. Schneebeli and D. Schötzau, Discontinuous Galerkin finite element method for the wave equation, Siam Journal on Numerical Analysis-SIAM J NUMER ANAL, vol. 44, no. 6, pp , [14] J. P. Bérenger, A Perfectly Matched Layer for the absorption of electromagnetic waves, Journal of Computational Physics, vol. 114, no. 2, pp , [15] F. Collino and C. Tsogka, Application of the PML absorbing layer model to the linear elastodynamic problem in anisotropic heterogeneous media, Geophysics, vol. 66, no. 1, pp , [16] J. A. Roden and S. D. Gedney, Convolution PML (CPML): an efficient FDTD implementation of the CFS-PML for arbitrary media, Microwave and Optical Technology Letters, vol. 27,no. 5, pp , [17] R. Martin, D. Komatitsch and S. D. Gedney, A variational formulation of a stabilized unsplit convolutional perfectly matched layer for the isotropic or anisotropic seismic wave equation, Computer Modeling in Engineering & Sciences, vol. 37, no. 3, pp , 2008b. [18] S. D. Gedney and B. Zhao, An auxiliary differential equation formulation for the complex-frequency shifted PML, IEEE Transactions on Antennas and Propagation, vol. 58, no. 3, pp , [19] R. Martin, D. Komatitsch, S. D. Gedney and E. Bruthiaux, A high-order time and space formulation of the unsplit perfectly matched layer for the seismic wave equation using Auxiliary Differential Equations (ADE-PML), Computer Modeling Engineering and Science, vol. 56, no. 1, pp.17-42, [20] K. C. Meza-Fajardo and A. S. Papageorgiou, A nonconvolutional, split-field, perfectly matched layer for wave propagation in isotropic and anisotropic elastic media: stability analysis, Bulletin of the Seismological Society of America, vol. 98, no. 4, pp , [21] L. Nyland, M. Harris and J. Prins, Fast N-body simulation with CUDA, GPU Gems 3, Chapter 31, pp , Addison-Wesley Professional, Boston, MA, USA, [22] D. Komatitsch, G. Erlebacher, D. Göddeke and D. Michéa, High-order finite-element seismic wave propagation modeling with MPI on a large-scale GPU cluster, Journal of Computer Physics, vol. 229, no. 20, [23] D. Komatitsch, D. Göddeke, G. Erlebacher and D. Michéa, Modeling the propagation of elastic waves using spectral elements on a cluster of 192 GPUs, Computer Scienc-Research and Development, vol.25, no. 1-1, pp , [24] R. Abdelkhalek, H. Calandra, O. Coulaud, J. Roman and G. Latu, Fast seismic modeling and reverse time migration on a GPU cluster, The 2009 High Performance Computing & Simulation, pp , [25] J. Zhou, D. Unat, D. J. Choi, C. C. Guest and Y. F. Cui, Hands-on Performance Tuning of 3D Finite Difference Earthquake Simulation on GPU Fermi Chipset, Procedia Computer Science, vol. 9, pp , [26] Y. Cui, K. B. Olsen, T.H.Jordan, K. Lee, J. Zhou, P. Small, D. Roten, G. Ely, D. Panda, A. Chourasia, J. Levesque, S. M. Day, and P. Maechling, Scalable Earthquake Simulation on Petascale Supercomputers, In Proceedings of the 2010 ACM/IEEE conference on Supercomputing, SC 10, pp. 1-20, Nov [27] A. Simone, and S. Hestholm, Instabilities in applying absorbing boundary conditions to high-order seismic modeling algorithms, Geophysics, vol. 63, pp , [28] S. Williams, A. Waterman, and D. Patterson. Roofline: an insightful visual performance model for multicore architectures. Communications of The ACM, vol. 52, no. 4, pp , [29] NVIDIA, NVIDIA CUDA Programming Guide, [30] O. Schenk, M. Christena and H. Burkharta, Algorithmic performance studies on graphics processing units, Journal of Parallel and Distributed Computing, vol. 68, pp , 2008.
Geophysical Journal International
Geophysical Journal International Geophys. J. Int. (2010) 182, 389 402 doi: 10.1111/j.1365-246X.2010.04616.x Accelerating a three-dimensional finite-difference wave propagation code using GPU graphics
More informationAccelerating a 3D finite-difference wave propagation code using GPU graphics cards
Accelerating a 3D finite-difference wave propagation code using GPU graphics cards David Michéa, Dimitri Komatitsch To cite this version: David Michéa, Dimitri Komatitsch. Accelerating a 3D finite-difference
More informationSpectral-element Simulations of Elastic Wave Propagation in Exploration and Geotechnical Applications
Spectral-element Simulations of Elastic Wave Propagation in Exploration and Geotechnical Applications Lin Zheng*, Qi Zhao*, Qinya Liu, Bernd Milkereit, Giovanni Grasselli, University of Toronto, ON, Canada
More informationNSE 1.7. SEG/Houston 2005 Annual Meeting 1057
Finite-difference Modeling of High-frequency Rayleigh waves Yixian Xu, China University of Geosciences, Wuhan, 430074, China; Jianghai Xia, and Richard D. Miller, Kansas Geological Survey, The University
More informationNUMERICAL SIMULATION OF IRREGULAR SURFACE ACOUSTIC WAVE EQUATION BASED ON ORTHOGONAL BODY-FITTED GRIDS
- 465 - NUMERICAL SIMULATION OF IRREGULAR SURFACE ACOUSTIC WAVE EQUATION BASED ON ORTHOGONAL BODY-FITTED GRIDS LIU, Z. Q. SUN, J. G. * SUN, H. * LIU, M. C. * GAO, Z. H. College for Geoexploration Science
More informationUsing multifrontal hierarchically solver and HPC systems for 3D Helmholtz problem
Using multifrontal hierarchically solver and HPC systems for 3D Helmholtz problem Sergey Solovyev 1, Dmitry Vishnevsky 1, Hongwei Liu 2 Institute of Petroleum Geology and Geophysics SB RAS 1 EXPEC ARC,
More informationHigh-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs
High-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs Gordon Erlebacher Department of Scientific Computing Sept. 28, 2012 with Dimitri Komatitsch (Pau,France) David Michea
More informationGraphical Processing Units (GPU)-based modeling for Acoustic and Ultrasonic NDE
18th World Conference on Nondestructive Testing, 16-20 April 2012, Durban, South Africa Graphical Processing Units (GPU)-based modeling for Acoustic and Ultrasonic NDE Nahas CHERUVALLYKUDY, Krishnan BALASUBRAMANIAM
More informationA Well-posed PML Absorbing Boundary Condition For 2D Acoustic Wave Equation
A Well-posed PML Absorbing Boundary Condition For 2D Acoustic Wave Equation Min Zhou ABSTRACT An perfectly matched layers absorbing boundary condition (PML) with an unsplit field is derived for the acoustic
More information3D Finite Difference Time-Domain Modeling of Acoustic Wave Propagation based on Domain Decomposition
3D Finite Difference Time-Domain Modeling of Acoustic Wave Propagation based on Domain Decomposition UMR Géosciences Azur CNRS-IRD-UNSA-OCA Villefranche-sur-mer Supervised by: Dr. Stéphane Operto Jade
More informationNUMERICAL MODELING OF ACOUSTIC WAVES IN 2D-FREQUENCY DOMAINS
Copyright 2013 by ABCM NUMERICAL MODELING OF ACOUSTIC WAVES IN 2D-FREQUENCY DOMAINS Márcio Filipe Ramos e Ramos Fluminense Federal University, Niterói, Brazil mfrruff@hotmail.com Gabriela Guerreiro Ferreira
More informationSEG/San Antonio 2007 Annual Meeting
Yaofeng He* and Ru-Shan Wu Modeling and Imaging Laboratory, Institute of Geophysics and Planetary Physics, University of California, Santa Cruz, CA, 95064, USA Summary A one-way and one-return boundary
More informationMulti-Domain Pattern. I. Problem. II. Driving Forces. III. Solution
Multi-Domain Pattern I. Problem The problem represents computations characterized by an underlying system of mathematical equations, often simulating behaviors of physical objects through discrete time
More informationWe G Time and Frequency-domain FWI Implementations Based on Time Solver - Analysis of Computational Complexities
We G102 05 Time and Frequency-domain FWI Implementations Based on Time Solver - Analysis of Computational Complexities R. Brossier* (ISTerre, Universite Grenoble Alpes), B. Pajot (ISTerre, Universite Grenoble
More informationTu P13 08 A MATLAB Package for Frequency Domain Modeling of Elastic Waves
Tu P13 8 A MATLAB Package for Frequency Domain Modeling of Elastic Waves E. Jamali Hondori* (Kyoto University), H. Mikada (Kyoto University), T.N. Goto (Kyoto University) & J. Takekawa (Kyoto University)
More informationOptimization solutions for the segmented sum algorithmic function
Optimization solutions for the segmented sum algorithmic function ALEXANDRU PÎRJAN Department of Informatics, Statistics and Mathematics Romanian-American University 1B, Expozitiei Blvd., district 1, code
More informationImplementation of the finite-difference method for solving Maxwell`s equations in MATLAB language on a GPU
Implementation of the finite-difference method for solving Maxwell`s equations in MATLAB language on a GPU 1 1 Samara National Research University, Moskovskoe Shosse 34, Samara, Russia, 443086 Abstract.
More informationParallelising Pipelined Wavefront Computations on the GPU
Parallelising Pipelined Wavefront Computations on the GPU S.J. Pennycook G.R. Mudalige, S.D. Hammond, and S.A. Jarvis. High Performance Systems Group Department of Computer Science University of Warwick
More informationWavefield Analysis of Rayleigh Waves for Near-Surface Shear-Wave Velocity. Chong Zeng
Wavefield Analysis of Rayleigh Waves for Near-Surface Shear-Wave Velocity By Chong Zeng Submitted to the graduate degree program in the Department of Geology and the Graduate Faculty of the University
More informationThe GPU-based Parallel Calculation of Gravity and Magnetic Anomalies for 3D Arbitrary Bodies
Available online at www.sciencedirect.com Procedia Environmental Sciences 12 (212 ) 628 633 211 International Conference on Environmental Science and Engineering (ICESE 211) The GPU-based Parallel Calculation
More informationIntroduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono
Introduction to CUDA Algoritmi e Calcolo Parallelo References q This set of slides is mainly based on: " CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory " Slide of Applied
More informationGPGPU LAB. Case study: Finite-Difference Time- Domain Method on CUDA
GPGPU LAB Case study: Finite-Difference Time- Domain Method on CUDA Ana Balevic IPVS 1 Finite-Difference Time-Domain Method Numerical computation of solutions to partial differential equations Explicit
More informationFinite Difference Time Domain (FDTD) Simulations Using Graphics Processors
Finite Difference Time Domain (FDTD) Simulations Using Graphics Processors Samuel Adams and Jason Payne US Air Force Research Laboratory, Human Effectiveness Directorate (AFRL/HE), Brooks City-Base, TX
More informationACCELERATING THE PRODUCTION OF SYNTHETIC SEISMOGRAMS BY A MULTICORE PROCESSOR CLUSTER WITH MULTIPLE GPUS
ACCELERATING THE PRODUCTION OF SYNTHETIC SEISMOGRAMS BY A MULTICORE PROCESSOR CLUSTER WITH MULTIPLE GPUS Ferdinando Alessi Annalisa Massini Roberto Basili INGV Introduction The simulation of wave propagation
More informationHow to Optimize Geometric Multigrid Methods on GPUs
How to Optimize Geometric Multigrid Methods on GPUs Markus Stürmer, Harald Köstler, Ulrich Rüde System Simulation Group University Erlangen March 31st 2011 at Copper Schedule motivation imaging in gradient
More informationTwo-Phase flows on massively parallel multi-gpu clusters
Two-Phase flows on massively parallel multi-gpu clusters Peter Zaspel Michael Griebel Institute for Numerical Simulation Rheinische Friedrich-Wilhelms-Universität Bonn Workshop Programming of Heterogeneous
More informationAnalyzing the Performance of IWAVE on a Cluster using HPCToolkit
Analyzing the Performance of IWAVE on a Cluster using HPCToolkit John Mellor-Crummey and Laksono Adhianto Department of Computer Science Rice University {johnmc,laksono}@rice.edu TRIP Meeting March 30,
More informationNumerical Simulation of Polarity Characteristics of Vector Elastic Wave of Advanced Detection in the Roadway
Research Journal of Applied Sciences, Engineering and Technology 5(23): 5337-5344, 203 ISS: 2040-7459; e-iss: 2040-7467 Mawell Scientific Organization, 203 Submitted: July 3, 202 Accepted: September 7,
More informationAbstract. Introduction. Kevin Todisco
- Kevin Todisco Figure 1: A large scale example of the simulation. The leftmost image shows the beginning of the test case, and shows how the fluid refracts the environment around it. The middle image
More informationCUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav
CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CMPE655 - Multiple Processor Systems Fall 2015 Rochester Institute of Technology Contents What is GPGPU? What s the need? CUDA-Capable GPU Architecture
More informationProfiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency
Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency Yijie Huangfu and Wei Zhang Department of Electrical and Computer Engineering Virginia Commonwealth University {huangfuy2,wzhang4}@vcu.edu
More informationIntroduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono
Introduction to CUDA Algoritmi e Calcolo Parallelo References This set of slides is mainly based on: CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory Slide of Applied
More informationOpenMP for next generation heterogeneous clusters
OpenMP for next generation heterogeneous clusters Jens Breitbart Research Group Programming Languages / Methodologies, Universität Kassel, jbreitbart@uni-kassel.de Abstract The last years have seen great
More informationGPU Programming Using NVIDIA CUDA
GPU Programming Using NVIDIA CUDA Siddhante Nangla 1, Professor Chetna Achar 2 1, 2 MET s Institute of Computer Science, Bandra Mumbai University Abstract: GPGPU or General-Purpose Computing on Graphics
More informationCO2 sequestration crosswell monitoring based upon spectral-element and adjoint methods
CO2 sequestration crosswell monitoring based upon spectral-element and adjoint methods Christina Morency Department of Geosciences, Princeton University Collaborators: Jeroen Tromp & Yang Luo Computational
More informationReverse time migration with random boundaries
Reverse time migration with random boundaries Robert G. Clapp ABSTRACT Reading wavefield checkpoints from disk is quickly becoming the bottleneck in Reverse Time Migration. We eliminate the need to write
More information2006: Short-Range Molecular Dynamics on GPU. San Jose, CA September 22, 2010 Peng Wang, NVIDIA
2006: Short-Range Molecular Dynamics on GPU San Jose, CA September 22, 2010 Peng Wang, NVIDIA Overview The LAMMPS molecular dynamics (MD) code Cell-list generation and force calculation Algorithm & performance
More informationOptimizing Complex Spatially-Variant Coefficient Stencils For Seismic Modeling on GPU
Optimizing Complex Spatially-Variant Coefficient Stencils For Seismic Modeling on GPU Jiarui Fang, Haohuan Fu He Zhang, Wei Wu, Nanxun Dai, Lin Gan, Guangwen Yang Ministry of Education Key Lab. for Earth
More informationOptimised corrections for finite-difference modelling in two dimensions
Optimized corrections for 2D FD modelling Optimised corrections for finite-difference modelling in two dimensions Peter M. Manning and Gary F. Margrave ABSTRACT Finite-difference two-dimensional correction
More informationNUMERICAL MODELING OF STRONG GROUND MOTION USING 3D GEO-MODELS
th World Conference on Earthquake Engineering Vancouver, B.C., Canada August -6, 24 Paper No. 92 NUMERICAL MODELING OF STRONG GROUND MOTION USING D GEO-MODELS Peter KLINC, Enrico PRIOLO and Alessandro
More informationAn Efficient 3D Elastic Full Waveform Inversion of Time-Lapse seismic data using Grid Injection Method
10 th Biennial International Conference & Exposition P 032 Summary An Efficient 3D Elastic Full Waveform Inversion of Time-Lapse seismic data using Grid Injection Method Dmitry Borisov (IPG Paris) and
More informationHigh Performance Computing on GPUs using NVIDIA CUDA
High Performance Computing on GPUs using NVIDIA CUDA Slides include some material from GPGPU tutorial at SIGGRAPH2007: http://www.gpgpu.org/s2007 1 Outline Motivation Stream programming Simplified HW and
More informationCMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)
CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can
More informationReverse-time migration imaging with/without multiples
Reverse-time migration imaging with/without multiples Zaiming Jiang, John C. Bancroft, and Laurence R. Lines Imaging with/without multiples ABSTRACT One of the challenges with reverse-time migration based
More information3D Helmholtz Krylov Solver Preconditioned by a Shifted Laplace Multigrid Method on Multi-GPUs
3D Helmholtz Krylov Solver Preconditioned by a Shifted Laplace Multigrid Method on Multi-GPUs H. Knibbe, C. W. Oosterlee, C. Vuik Abstract We are focusing on an iterative solver for the three-dimensional
More informationAccelerating Molecular Modeling Applications with Graphics Processors
Accelerating Molecular Modeling Applications with Graphics Processors John Stone Theoretical and Computational Biophysics Group University of Illinois at Urbana-Champaign Research/gpu/ SIAM Conference
More informationReflectivity modeling for stratified anelastic media
Reflectivity modeling for stratified anelastic media Peng Cheng and Gary F. Margrave Reflectivity modeling ABSTRACT The reflectivity method is widely used for the computation of synthetic seismograms for
More informationVery fast simulation of nonlinear water waves in very large numerical wave tanks on affordable graphics cards
Very fast simulation of nonlinear water waves in very large numerical wave tanks on affordable graphics cards By Allan P. Engsig-Karup, Morten Gorm Madsen and Stefan L. Glimberg DTU Informatics Workshop
More informationhigh performance medical reconstruction using stream programming paradigms
high performance medical reconstruction using stream programming paradigms This Paper describes the implementation and results of CT reconstruction using Filtered Back Projection on various stream programming
More informationHiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes.
HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes Ian Glendinning Outline NVIDIA GPU cards CUDA & OpenCL Parallel Implementation
More informationPyramid-shaped grid for elastic wave propagation Feng Chen * and Sheng Xu, CGGVeritas
Feng Chen * and Sheng Xu, CGGVeritas Summary Elastic wave propagation is elemental to wave-equationbased migration and modeling. Conventional simulation of wave propagation is done on a grid of regular
More informationPerformance Evaluations for Parallel Image Filter on Multi - Core Computer using Java Threads
Performance Evaluations for Parallel Image Filter on Multi - Core Computer using Java s Devrim Akgün Computer Engineering of Technology Faculty, Duzce University, Duzce,Turkey ABSTRACT Developing multi
More informationOptimizing Data Locality for Iterative Matrix Solvers on CUDA
Optimizing Data Locality for Iterative Matrix Solvers on CUDA Raymond Flagg, Jason Monk, Yifeng Zhu PhD., Bruce Segee PhD. Department of Electrical and Computer Engineering, University of Maine, Orono,
More informationWe N Converted-phase Seismic Imaging - Amplitudebalancing Source-independent Imaging Conditions
We N106 02 Converted-phase Seismic Imaging - Amplitudebalancing -independent Imaging Conditions A.H. Shabelansky* (Massachusetts Institute of Technology), A.E. Malcolm (Memorial University of Newfoundland)
More informationUsing GPUs to compute the multilevel summation of electrostatic forces
Using GPUs to compute the multilevel summation of electrostatic forces David J. Hardy Theoretical and Computational Biophysics Group Beckman Institute for Advanced Science and Technology University of
More informationImplementation of PML in the Depth-oriented Extended Forward Modeling
Implementation of PML in the Depth-oriented Extended Forward Modeling Lei Fu, William W. Symes The Rice Inversion Project (TRIP) April 19, 2013 Lei Fu, William W. Symes (TRIP) PML in Extended modeling
More informationOn the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters
1 On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters N. P. Karunadasa & D. N. Ranasinghe University of Colombo School of Computing, Sri Lanka nishantha@opensource.lk, dnr@ucsc.cmb.ac.lk
More informationREDUCING BEAMFORMING CALCULATION TIME WITH GPU ACCELERATED ALGORITHMS
BeBeC-2014-08 REDUCING BEAMFORMING CALCULATION TIME WITH GPU ACCELERATED ALGORITHMS Steffen Schmidt GFaI ev Volmerstraße 3, 12489, Berlin, Germany ABSTRACT Beamforming algorithms make high demands on the
More informationI. INTRODUCTION. Manisha N. Kella * 1 and Sohil Gadhiya2.
2018 IJSRSET Volume 4 Issue 4 Print ISSN: 2395-1990 Online ISSN : 2394-4099 Themed Section : Engineering and Technology A Survey on AES (Advanced Encryption Standard) and RSA Encryption-Decryption in CUDA
More informationA Wavelet-Based Method for Simulation of Seismic Wave Propagation
AGU Fall Meeting, A Wavelet-Based Method for Simulation of Seismic Wave Propagation Tae-Kyung Hong & B.L.N. Kennett Research School of Earth Sciences The Australian National University Abstract Seismic
More informationOptimization of HOM Couplers using Time Domain Schemes
Optimization of HOM Couplers using Time Domain Schemes Workshop on HOM Damping in Superconducting RF Cavities Carsten Potratz Universität Rostock October 11, 2010 10/11/2010 2009 UNIVERSITÄT ROSTOCK FAKULTÄT
More informationTimo Lähivaara, Tomi Huttunen, Simo-Pekka Simonaho University of Kuopio, Department of Physics P.O.Box 1627, FI-70211, Finland
Timo Lähivaara, Tomi Huttunen, Simo-Pekka Simonaho University of Kuopio, Department of Physics P.O.Box 627, FI-72, Finland timo.lahivaara@uku.fi INTRODUCTION The modeling of the acoustic wave fields often
More informationOptimization of finite-difference kernels on multi-core architectures for seismic applications
Optimization of finite-difference kernels on multi-core architectures for seismic applications V. Etienne 1, T. Tonellot 1, K. Akbudak 2, H. Ltaief 2, S. Kortas 3, T. Malas 4, P. Thierry 4, D. Keyes 2
More informationRTM using effective boundary saving: A staggered grid GPU implementation a
RTM using effective boundary saving: A staggered grid GPU implementation a a Published in Computers & Geosciences, 68, 64-72, (2014) Pengliang Yang, Jinghuai Gao, and Baoli Wang Xi an Jiaotong University,
More information3D ADI Method for Fluid Simulation on Multiple GPUs. Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA
3D ADI Method for Fluid Simulation on Multiple GPUs Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA Introduction Fluid simulation using direct numerical methods Gives the most accurate result Requires
More information1 Past Research and Achievements
Parallel Mesh Generation and Adaptation using MAdLib T. K. Sheel MEMA, Universite Catholique de Louvain Batiment Euler, Louvain-La-Neuve, BELGIUM Email: tarun.sheel@uclouvain.be 1 Past Research and Achievements
More informationRadial Basis Function-Generated Finite Differences (RBF-FD): New Opportunities for Applications in Scientific Computing
Radial Basis Function-Generated Finite Differences (RBF-FD): New Opportunities for Applications in Scientific Computing Natasha Flyer National Center for Atmospheric Research Boulder, CO Meshes vs. Mesh-free
More informationGPU implementation of minimal dispersion recursive operators for reverse time migration
GPU implementation of minimal dispersion recursive operators for reverse time migration Allon Bartana*, Dan Kosloff, Brandon Warnell, Chris Connor, Jeff Codd and David Kessler, SeismicCity Inc. Paulius
More informationFinite-difference elastic modelling below a structured free surface
FD modelling below a structured free surface Finite-difference elastic modelling below a structured free surface Peter M. Manning ABSTRACT This paper shows experiments using a unique method of implementing
More informationA Fast Speckle Reduction Algorithm based on GPU for Synthetic Aperture Sonar
Vol.137 (SUComS 016), pp.8-17 http://dx.doi.org/1457/astl.016.137.0 A Fast Speckle Reduction Algorithm based on GPU for Synthetic Aperture Sonar Xu Kui 1, Zhong Heping 1, Huang Pan 1 1 Naval Institute
More informationHigh performance FDTD algorithm for GPGPU supercomputers
Journal of Physics: Conference Series PAPER OPEN ACCESS High performance FDTD algorithm for GPGPU supercomputers To cite this article: Andrey Zakirov et al 206 J. Phys.: Conf. Ser. 759 0200 Related content
More informationInteraction of Fluid Simulation Based on PhysX Physics Engine. Huibai Wang, Jianfei Wan, Fengquan Zhang
4th International Conference on Sensors, Measurement and Intelligent Materials (ICSMIM 2015) Interaction of Fluid Simulation Based on PhysX Physics Engine Huibai Wang, Jianfei Wan, Fengquan Zhang College
More informationPerformance Analysis and Optimization of Gyrokinetic Torodial Code on TH-1A Supercomputer
Performance Analysis and Optimization of Gyrokinetic Torodial Code on TH-1A Supercomputer Xiaoqian Zhu 1,2, Xin Liu 1, Xiangfei Meng 2, Jinghua Feng 2 1 School of Computer, National University of Defense
More informationB. Tech. Project Second Stage Report on
B. Tech. Project Second Stage Report on GPU Based Active Contours Submitted by Sumit Shekhar (05007028) Under the guidance of Prof Subhasis Chaudhuri Table of Contents 1. Introduction... 1 1.1 Graphic
More informationThe determination of the correct
SPECIAL High-performance SECTION: H i gh-performance computing computing MARK NOBLE, Mines ParisTech PHILIPPE THIERRY, Intel CEDRIC TAILLANDIER, CGGVeritas (formerly Mines ParisTech) HENRI CALANDRA, Total
More informationShallow Water Simulations on Graphics Hardware
Shallow Water Simulations on Graphics Hardware Ph.D. Thesis Presentation 2014-06-27 Martin Lilleeng Sætra Outline Introduction Parallel Computing and the GPU Simulating Shallow Water Flow Topics of Thesis
More informationTo Use or Not to Use: CPUs Cache Optimization Techniques on GPGPUs
To Use or Not to Use: CPUs Optimization Techniques on GPGPUs D.R.V.L.B. Thambawita Department of Computer Science and Technology Uva Wellassa University Badulla, Sri Lanka Email: vlbthambawita@gmail.com
More informationSoftware and Performance Engineering for numerical codes on GPU clusters
Software and Performance Engineering for numerical codes on GPU clusters H. Köstler International Workshop of GPU Solutions to Multiscale Problems in Science and Engineering Harbin, China 28.7.2010 2 3
More informationDirect Numerical Simulation of Turbulent Boundary Layers at High Reynolds Numbers.
Direct Numerical Simulation of Turbulent Boundary Layers at High Reynolds Numbers. G. Borrell, J.A. Sillero and J. Jiménez, Corresponding author: guillem@torroja.dmt.upm.es School of Aeronautics, Universidad
More informationAccelerating CFD with Graphics Hardware
Accelerating CFD with Graphics Hardware Graham Pullan (Whittle Laboratory, Cambridge University) 16 March 2009 Today Motivation CPUs and GPUs Programming NVIDIA GPUs with CUDA Application to turbomachinery
More informationChallenges in large-scale graph processing on HPC platforms and the Graph500 benchmark. by Nkemdirim Dockery
Challenges in large-scale graph processing on HPC platforms and the Graph500 benchmark by Nkemdirim Dockery High Performance Computing Workloads Core-memory sized Floating point intensive Well-structured
More informationGPU Computing: Development and Analysis. Part 1. Anton Wijs Muhammad Osama. Marieke Huisman Sebastiaan Joosten
GPU Computing: Development and Analysis Part 1 Anton Wijs Muhammad Osama Marieke Huisman Sebastiaan Joosten NLeSC GPU Course Rob van Nieuwpoort & Ben van Werkhoven Who are we? Anton Wijs Assistant professor,
More informationSponge boundary condition for frequency-domain modeling
GEOPHYSIS, VOL. 60, NO. 6 (NOVEMBER-DEEMBER 1995); P. 1870-1874, 6 FIGS. Sponge boundary condition for frequency-domain modeling hangsoo Shin ABSTRAT Several techniques have been developed to get rid of
More informationApplications of Berkeley s Dwarfs on Nvidia GPUs
Applications of Berkeley s Dwarfs on Nvidia GPUs Seminar: Topics in High-Performance and Scientific Computing Team N2: Yang Zhang, Haiqing Wang 05.02.2015 Overview CUDA The Dwarfs Dynamic Programming Sparse
More informationFast Multipole and Related Algorithms
Fast Multipole and Related Algorithms Ramani Duraiswami University of Maryland, College Park http://www.umiacs.umd.edu/~ramani Joint work with Nail A. Gumerov Efficiency by exploiting symmetry and A general
More informationarxiv: v1 [math.na] 26 Jun 2014
for spectrally accurate wave propagation Vladimir Druskin, Alexander V. Mamonov and Mikhail Zaslavsky, Schlumberger arxiv:406.6923v [math.na] 26 Jun 204 SUMMARY We develop a method for numerical time-domain
More informationAsynchronous OpenCL/MPI numerical simulations of conservation laws
Asynchronous OpenCL/MPI numerical simulations of conservation laws Philippe HELLUY 1,3, Thomas STRUB 2. 1 IRMA, Université de Strasbourg, 2 AxesSim, 3 Inria Tonus, France IWOCL 2015, Stanford Conservation
More informationCenter for Computational Science
Center for Computational Science Toward GPU-accelerated meshfree fluids simulation using the fast multipole method Lorena A Barba Boston University Department of Mechanical Engineering with: Felipe Cruz,
More informationFast Multipole Method on the GPU
Fast Multipole Method on the GPU with application to the Adaptive Vortex Method University of Bristol, Bristol, United Kingdom. 1 Introduction Particle methods Highly parallel Computational intensive Numerical
More informationDevelopment of a Maxwell Equation Solver for Application to Two Fluid Plasma Models. C. Aberle, A. Hakim, and U. Shumlak
Development of a Maxwell Equation Solver for Application to Two Fluid Plasma Models C. Aberle, A. Hakim, and U. Shumlak Aerospace and Astronautics University of Washington, Seattle American Physical Society
More informationPerformance potential for simulating spin models on GPU
Performance potential for simulating spin models on GPU Martin Weigel Institut für Physik, Johannes-Gutenberg-Universität Mainz, Germany 11th International NTZ-Workshop on New Developments in Computational
More informationMD-CUDA. Presented by Wes Toland Syed Nabeel
MD-CUDA Presented by Wes Toland Syed Nabeel 1 Outline Objectives Project Organization CPU GPU GPGPU CUDA N-body problem MD on CUDA Evaluation Future Work 2 Objectives Understand molecular dynamics (MD)
More informationSeismic modelling with the reflectivity method
Yongwang Ma, Luiz Loures, and Gary F. Margrave ABSTRACT Seismic modelling with the reflectivity method Numerical seismic modelling is a powerful tool in seismic imaging, interpretation and inversion. Wave
More informationSPECFEM3D_GLOBE. Mostly developed at Caltech (USA) and University of Pau (France) History: v1.0: 1999/2000 ; v3.6: 2005; v4.
SPECFEM3D_GLOBE Min Chen Vala Hjörleifsdóttir Sue Kientz Dimitri Komatitsch Qinya Liu Alessia Maggi David Michéa Brian Savage Bernhard Schuberth Leif Strand Carl Tape Jeroen Tromp The SPECFEM3D source
More informationAccelerated Load Balancing of Unstructured Meshes
Accelerated Load Balancing of Unstructured Meshes Gerrett Diamond, Lucas Davis, and Cameron W. Smith Abstract Unstructured mesh applications running on large, parallel, distributed memory systems require
More informationUniversity of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory
University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory Locality Optimization of Stencil Applications using Data Dependency Graphs
More informationPhD Student. Associate Professor, Co-Director, Center for Computational Earth and Environmental Science. Abdulrahman Manea.
Abdulrahman Manea PhD Student Hamdi Tchelepi Associate Professor, Co-Director, Center for Computational Earth and Environmental Science Energy Resources Engineering Department School of Earth Sciences
More informationCSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University
CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand
More informationAccelerating GPU computation through mixed-precision methods. Michael Clark Harvard-Smithsonian Center for Astrophysics Harvard University
Accelerating GPU computation through mixed-precision methods Michael Clark Harvard-Smithsonian Center for Astrophysics Harvard University Outline Motivation Truncated Precision using CUDA Solving Linear
More informationP052 3D Modeling of Acoustic Green's Function in Layered Media with Diffracting Edges
P052 3D Modeling of Acoustic Green's Function in Layered Media with Diffracting Edges M. Ayzenberg* (Norwegian University of Science and Technology, A. Aizenberg (Institute of Petroleum Geology and Geophysics,
More information