Fast Compressive Sensing MRI Reconstruction using multi-gpu system

Size: px
Start display at page:

Download "Fast Compressive Sensing MRI Reconstruction using multi-gpu system"

Transcription

1 Fast Compressive Sensing MRI Reconstruction using multi-gpu system TRAN MINH QUAN WON-KI JEONG High-performance Visual Computing Laboratory, School of Electrical and Computer Engineering, Ulsan National Institute of Science and Technology (UNIST). UNIST-gil 50 (100 Banyeon-ri), Eonyang-eup, Ulju-gun, Ulsan Metropolitan City. Republic of Korea,

2 Talk Overview Introduction 2D Dynamic Compressive Sensing MRI (CS MRI) Split Bregman (SB) Method Total Variation Fast 2D Discrete Wavelet Transform (DWT) with mixed-band Result of single GPU system Result of multi GPU system Q&A 2

3 2D Dynamic MRI (2.5D MRI) Cardiac MRI Perfusion MRI 3

4 MRI Reconstruction VERY FAST f = Ku K = RF R: sampling mask F: 2D Fourier Transform IFFT2 IFFT2 CS VERY SLOW Traditional MRI Zero Filling Reconstruction CS MRI 4

5 Motivation Why do we use sparse sampling? ~20-40 minutes down to ~1-3 minutes Greatly reduce the scanning time (~16x) Why do we use the GPUs? Speed up the reconstruction time 5

6 CSMRI Problem min u J(u) s.t i Ku i f i 2 < μ Lustig et. al. J u = F z W xy u 1 Goldstein et. al. x z (temporal) J u = xyz u 1 Our method y J u = W xy u 1 + xy u 1 + z u 1 => l 1 minimization problem 6

7 Proposed SB CSMRI Algorithm Initialize u 0 = RF 1 f and d x 0 = d y 0 = d z 0 = w 0 = 0 While u k u k 1 2 > tol u k = min u d k+1 x = max s k xy μ Ku f 2 + λ 2 2 dk u b k 2 + γ 2 wk k Wu b 2 w 1 λ x, 0 xu k +b x k sk xy Sub Optimization Problem End d k+1 y = max s k xy 1, 0 yu k +b y λ k y s xy d k+1 z = max s k z 1, 0 zu k +bk z λ z w k+1 = shrink Wu k+1 + b w k, 1 γ b x k+1 = b x k + x u k+1 d x k+1 b y k+1 = b y k + y u k+1 d y k+1 b z k+1 = b z k + z u k+1 d z k+1 b w k+1 = b w k + Wu k+1 w k+1 s z k k J u = xy u 1 + z u 1 + W xy u 1 Update Bregman Distances (Smoothing/Thresholding) Update Bregman Variables Ref: Goldstein et. al, The Split Bregman Method for L1-Regularized Problems,. SIAM

8 Building Blocks Iterative solver Gradient, Laplacian operators Using Finite Difference Method Discrete Fourier transform CUFFT Discrete Wavelet transform Fast GPU mixed-band algorithm 8

9 2D Wavelet Transform Traditional Approach * * * 9

10 2D Wavelet Transform Traditional vs. Mixed-band 10

11 2D Wavelet Transform with Mixed-band (1) Haar 2x2 M = a b c d W = G = W M W T Haar 4x4 G = 1 2 +a + b + c + d +a + b c d +a b + c d +a b c + d Haar 8x8 11

12 2D Wavelet Transform with Mixed-band (2) encode_8 Kernel decode_8 Kernel Why do we choose block Size 8x8? 12

13 Optimize 2D Haar Wavelet (1) G = 1 +a + b + c + d 2 +a + b c d Broadcasting +a b + c d +a b c + d global void encode_8(float2* src, float2* dst, int nrows, int ncols, int irows, int icols) { //Read a 8x8 block from global memory to shared memory... syncthreads(); float2 a, b, c, d; //Registers type, each thread will have its own values //First time Haar 2x2 if(((tid.y&0)==0)&&((tid.x&0)==0)) { a = smem[(tid.y & (~1)) + 0][(tid.x & (~1)) + 0]; b = smem[(tid.y & (~1)) + 0][(tid.x & (~1)) + 1]; c = smem[(tid.y & (~1)) + 1][(tid.x & (~1)) + 0]; d = smem[(tid.y & (~1)) + 1][(tid.x & (~1)) + 1]; } syncthreads(); Haar 2x2 if(((tid.y&0) == 0)&&((tid.x&0) == 0)) smem[(tid.y][tid.x] = 0.5f * (a + b + c + d); else if(((tid.y&0) == 0)&&((tid.x&0) == 1)) smem[(tid.y][tid.x] = 0.5f * (a - b + c - d); else if(((tid.y&0) == 1)&&((tid.x&0) == 0)) smem[(tid.y][tid.x] = 0.5f * (a + b - c - d); else if(((tid.y&0) == 1)&&((tid.x&0) == 1)) smem[(tid.y][tid.x] = 0.5f * (a - b - c + d); syncthreads(); Divergence } //Second time Haar 2x2... //Third time Haar 2x

14 Optimize 2D Haar Wavelet (2) device void switchsign(unsigned int intsign, float2* number) { *(number) *= intsign? (-1.0f):(1.0f); } G = 1 2 H 2 = 1 2 +a + b + c + d +a + b c d +a b + c d +a b c + d Recursive Representation +H d 1 +H d 1 H d 1 H d = 1 2 +H d 1 Synthetic Representation H d i,j = 1 (i j) 2 d 2 1 is bitwise dot product global void encode_8(float2* src, float2* dst, int nrows, int ncols, int irows, int icols) { //Read a 8x8 block from global memory to shared memory... syncthreads(); float2 a, b, c, d; //Registers type, each thread will have its own values //First time Haar 2x2 if(((tid.y&0)==0)&&((tid.x&0)==0)) { a = smem[(tid.y & (~1)) + 0][(tid.x & (~1)) + 0]; b = smem[(tid.y & (~1)) + 0][(tid.x & (~1)) + 1]; c = smem[(tid.y & (~1)) + 1][(tid.x & (~1)) + 0]; d = smem[(tid.y & (~1)) + 1][(tid.x & (~1)) + 1]; } switchsign( (((tid.y>>0 & 1) & 0) ^ ((tid.x>>0 & 1) & 0)), &a); switchsign( (((tid.y>>0 & 1) & 0) ^ ((tid.x>>0 & 1) & 1)), &b); switchsign( (((tid.y>>0 & 1) & 1) ^ ((tid.x>>0 & 1) & 0)), &c); switchsign( (((tid.y>>0 & 1) & 1) ^ ((tid.x>>0 & 1) & 1)), &d); smem[(tid.y][tid.x] = 0.5f * (a + b + c + d); } syncthreads(); //Second time Haar 2x2... //Third time Haar 2x (3 2) = 1 1,1 (1,0) = = 1 14

15 Optimize 2D Haar Wavelet (3) device void switchsign(unsigned int intsign, float2* number) { *(number) *= intsign? (-1.0f):(1.0f); } device void switchsign(unsigned int intsign, float2* number) { *((long long int*)number) ^= intsign? 0x : 0x ; } Complex number y = m + i*n float2.x float2.y 2D WAVELET TRANSFORM WITH MIXED-BAND Image size 512 x 512 Shared Memory Multiply with -1 or +1 Casting signed bit 1 lvl 3 lvls 9 lvls 1 lvl 3 lvls 9 lvls 1 lvl 3 lvls 9 lvls 2D Forward Wavelet N/a N/a D Inverse Wavelet N/a N/a Unit: Miliseconds 15

16 Comparison of Lenna (512x512): Full Decomposition 9 levels Filter Scheme (GPU): milisecond Lifting Scheme (GPU) : milisecond Mixed-band (GPU) : millisecond (20x faster) 16

17 Put everything together with 1 GPU min u J(u) s.t i Ku i f i 2 < μ J u = xy u 1 + z u 1 + W xy u 1 17

18 Results of 2D Dynamic MRI Flank Tumor Dataset (256 slices) 1/1 1/4 1/8 1/10 1/12 1/16 Image size: 128x128 18

19 Performance of the CSMRI reconstruction (in milliseconds) Operations 32 slices 128 slices 256 slices Inverse Differentiation Compute Right Hand Side Sub Optimization Prob (Modified Richardson) Forward Differentiation Shrinkage Update Bregman Parameter Update Kspace Total Image size: 128x128 19

20 MultiGPU system information ~]$ lspci -tv Advanced ~]$ lspci Micro tvdevices [AMD] nee ATI RD Northbridge Advanced Micro only dual Devices slot [AMD] (2x8) nee PCI-e ATI GFX RD890 Hydra part Northbridge only dual slot (2x8) PCI-e GFX Hydra part NVIDIA Corporation Tesla M \ NVIDIA NVIDIA Corporation Corporation Tesla Tesla M2090 M2090 \ NVIDIA Corporation Tesla M2090 \ NVIDIA Corporation Tesla M2090 \ \ NVIDIA NVIDIA Corporation Corporation Tesla Tesla M2090 M2090 \ NVIDIA Corporation Tesla M2090 \ Advanced Micro Devices [AMD] nee ATI RD890 \ Northbridge Advanced Micro only Devices dual slot [AMD] (2x8) nee PCI-e ATI RD890 GFX Hydra part Northbridge only dual slot (2x8) PCI-e GFX Hydra part NVIDIA Corporation Tesla M \ NVIDIA NVIDIA Corporation Corporation Tesla Tesla M2090 M2090 \ NVIDIA Corporation Tesla M2090 \ NVIDIA Corporation Tesla M2090 \ \ NVIDIA NVIDIA Corporation Corporation Tesla Tesla M2090 M2090 \ NVIDIA Corporation Tesla M2090 MPI 20

21 MultiGPU Implementation (1) OpenMP J u = xy u 1 + z u 1 + W xy u 1 cudamemcpypeer2peer ~6.1 GB/s. Paulius M. Implementing 3D Finite Difference code on GPUs GTC

22 MultiGPU Implementation (2) OpenMP J u = xy u 1 + z u 1 + W xy u 1 cudamemcpypeer2peer ~6.1 GB/s. Paulius M. Implementing 3D Finite Difference code on GPUs GTC

23 MultiGPU Implementation (3) OpenMP J u = xy u 1 + z u 1 + W xy u 1 cudamemcpypeer2peer ~6.1 GB/s. Paulius M. Implementing 3D Finite Difference code on GPUs GTC

24 Performance on multiple GPUs Operations Inverse Differentiation Compute Right Hand Side Sub Optimization Problem Forward Differentiation Shrinkage Update Bregman Parameter Update Kspace Data Transfer Total Operations Inverse Differentiation Compute Right Hand Side Sub Optimization Problem Forward Differentiation Shrinkage Update Bregman Parameter Update Kspace Data Transfer Total 1 GPUs 2 GPUs Sync 4 GPUs Sync GPUs Sync

25 Time (miliseconds) Scalability x 1.7x 2.9x 4.3x Number of GPUs 25

26 Conclusion Summary Split Bregman Formulation for dynamic CSMRI 2D DWT on the GPU using mixed-band algorithm Multi-GPU implementation using P2P communication Acknowledgement Thanks to HyungJoon Cho and SoHyun Han for data and discussion. Funding from NRF Grant # 2012R1A1A

27 Thank you 27

Compressive Sensing Algorithms for Fast and Accurate Imaging

Compressive Sensing Algorithms for Fast and Accurate Imaging Compressive Sensing Algorithms for Fast and Accurate Imaging Wotao Yin Department of Computational and Applied Mathematics, Rice University SCIMM 10 ASU, Tempe, AZ Acknowledgements: results come in part

More information

THE discrete wavelet transform (DWT) has been actively

THE discrete wavelet transform (DWT) has been actively TO APPEAR IN IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 1 A fast discrete wavelet transform using hybrid parallelism on GPUs Tran Minh Quan, Student Member, IEEE, and Won-Ki Jeong, Member, IEEE

More information

Supplemental Material for Efficient MR Image Reconstruction for Compressed MR Imaging

Supplemental Material for Efficient MR Image Reconstruction for Compressed MR Imaging Supplemental Material for Efficient MR Image Reconstruction for Compressed MR Imaging Paper ID: 1195 No Institute Given 1 More Experiment Results 1.1 Visual Comparisons We apply all methods on four 2D

More information

Georgia Institute of Technology Center for Signal and Image Processing Steve Conover February 2009

Georgia Institute of Technology Center for Signal and Image Processing Steve Conover February 2009 Georgia Institute of Technology Center for Signal and Image Processing Steve Conover February 2009 Introduction CUDA is a tool to turn your graphics card into a small computing cluster. It s not always

More information

Higher Degree Total Variation for 3-D Image Recovery

Higher Degree Total Variation for 3-D Image Recovery Higher Degree Total Variation for 3-D Image Recovery Greg Ongie*, Yue Hu, Mathews Jacob Computational Biomedical Imaging Group (CBIG) University of Iowa ISBI 2014 Beijing, China Motivation: Compressed

More information

Optimizing Data Locality for Iterative Matrix Solvers on CUDA

Optimizing Data Locality for Iterative Matrix Solvers on CUDA Optimizing Data Locality for Iterative Matrix Solvers on CUDA Raymond Flagg, Jason Monk, Yifeng Zhu PhD., Bruce Segee PhD. Department of Electrical and Computer Engineering, University of Maine, Orono,

More information

Gabor and Wavelet transforms for signals defined on graphs An invitation to signal processing on graphs

Gabor and Wavelet transforms for signals defined on graphs An invitation to signal processing on graphs Gabor and Wavelet transforms for signals defined on graphs An invitation to signal processing on graphs Benjamin Ricaud Signal Processing Lab 2, EPFL Lausanne, Switzerland SPLab, Brno, 10/2013 B. Ricaud

More information

On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators

On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators Karl Rupp, Barry Smith rupp@mcs.anl.gov Mathematics and Computer Science Division Argonne National Laboratory FEMTEC

More information

CUDA Architecture & Programming Model

CUDA Architecture & Programming Model CUDA Architecture & Programming Model Course on Multi-core Architectures & Programming Oliver Taubmann May 9, 2012 Outline Introduction Architecture Generation Fermi A Brief Look Back At Tesla What s New

More information

Multigrid algorithms on multi-gpu architectures

Multigrid algorithms on multi-gpu architectures Multigrid algorithms on multi-gpu architectures H. Köstler European Multi-Grid Conference EMG 2010 Isola d Ischia, Italy 20.9.2010 2 Contents Work @ LSS GPU Architectures and Programming Paradigms Applications

More information

Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters

Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences,

More information

Introduction. Wavelets, Curvelets [4], Surfacelets [5].

Introduction. Wavelets, Curvelets [4], Surfacelets [5]. Introduction Signal reconstruction from the smallest possible Fourier measurements has been a key motivation in the compressed sensing (CS) research [1]. Accurate reconstruction from partial Fourier data

More information

Efficient MR Image Reconstruction for Compressed MR Imaging

Efficient MR Image Reconstruction for Compressed MR Imaging Efficient MR Image Reconstruction for Compressed MR Imaging Junzhou Huang, Shaoting Zhang, and Dimitris Metaxas Division of Computer and Information Sciences, Rutgers University, NJ, USA 08854 Abstract.

More information

Multi-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation

Multi-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation Multi-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation 1 Cheng-Han Du* I-Hsin Chung** Weichung Wang* * I n s t i t u t e o f A p p l i e d M

More information

1.7.1 Laplacian Smoothing

1.7.1 Laplacian Smoothing 1.7.1 Laplacian Smoothing 320491: Advanced Graphics - Chapter 1 434 Theory Minimize energy functional total curvature estimate by polynomial-fitting non-linear (very slow!) 320491: Advanced Graphics -

More information

Lecture 17: Array Algorithms

Lecture 17: Array Algorithms Lecture 17: Array Algorithms CS178: Programming Parallel and Distributed Systems April 4, 2001 Steven P. Reiss I. Overview A. We talking about constructing parallel programs 1. Last time we discussed sorting

More information

G Practical Magnetic Resonance Imaging II Sackler Institute of Biomedical Sciences New York University School of Medicine. Compressed Sensing

G Practical Magnetic Resonance Imaging II Sackler Institute of Biomedical Sciences New York University School of Medicine. Compressed Sensing G16.4428 Practical Magnetic Resonance Imaging II Sackler Institute of Biomedical Sciences New York University School of Medicine Compressed Sensing Ricardo Otazo, PhD ricardo.otazo@nyumc.org Compressed

More information

Sparse sampling in MRI: From basic theory to clinical application. R. Marc Lebel, PhD Department of Electrical Engineering Department of Radiology

Sparse sampling in MRI: From basic theory to clinical application. R. Marc Lebel, PhD Department of Electrical Engineering Department of Radiology Sparse sampling in MRI: From basic theory to clinical application R. Marc Lebel, PhD Department of Electrical Engineering Department of Radiology Objective Provide an intuitive overview of compressed sensing

More information

A primal-dual framework for mixtures of regularizers

A primal-dual framework for mixtures of regularizers A primal-dual framework for mixtures of regularizers Baran Gözcü baran.goezcue@epfl.ch Laboratory for Information and Inference Systems (LIONS) École Polytechnique Fédérale de Lausanne (EPFL) Switzerland

More information

Optical flow and depth from motion for omnidirectional images using a TV-L1 variational framework on graphs

Optical flow and depth from motion for omnidirectional images using a TV-L1 variational framework on graphs ICIP 2009 - Monday, November 9 Optical flow and depth from motion for omnidirectional images using a TV-L1 variational framework on graphs Luigi Bagnato Signal Processing Laboratory - EPFL Advisors: Prof.

More information

Iterative CT Reconstruction Using Curvelet-Based Regularization

Iterative CT Reconstruction Using Curvelet-Based Regularization Iterative CT Reconstruction Using Curvelet-Based Regularization Haibo Wu 1,2, Andreas Maier 1, Joachim Hornegger 1,2 1 Pattern Recognition Lab (LME), Department of Computer Science, 2 Graduate School in

More information

Warp shuffles. Lecture 4: warp shuffles, and reduction / scan operations. Warp shuffles. Warp shuffles

Warp shuffles. Lecture 4: warp shuffles, and reduction / scan operations. Warp shuffles. Warp shuffles Warp shuffles Lecture 4: warp shuffles, and reduction / scan operations Prof. Mike Giles mike.giles@maths.ox.ac.uk Oxford University Mathematical Institute Oxford e-research Centre Lecture 4 p. 1 Warp

More information

Exploiting GPU Caches in Sparse Matrix Vector Multiplication. Yusuke Nagasaka Tokyo Institute of Technology

Exploiting GPU Caches in Sparse Matrix Vector Multiplication. Yusuke Nagasaka Tokyo Institute of Technology Exploiting GPU Caches in Sparse Matrix Vector Multiplication Yusuke Nagasaka Tokyo Institute of Technology Sparse Matrix Generated by FEM, being as the graph data Often require solving sparse linear equation

More information

IMAGE FUSION WITH SIMULTANEOUS CARTOON AND TEXTURE DECOMPOSITION MAHDI DODANGEH, ISABEL NARRA FIGUEIREDO AND GIL GONÇALVES

IMAGE FUSION WITH SIMULTANEOUS CARTOON AND TEXTURE DECOMPOSITION MAHDI DODANGEH, ISABEL NARRA FIGUEIREDO AND GIL GONÇALVES Pré-Publicações do Departamento de Matemática Universidade de Coimbra Preprint Number 15 14 IMAGE FUSION WITH SIMULTANEOUS CARTOON AND TEXTURE DECOMPOSITION MAHDI DODANGEH, ISABEL NARRA FIGUEIREDO AND

More information

Lecture 4: warp shuffles, and reduction / scan operations

Lecture 4: warp shuffles, and reduction / scan operations Lecture 4: warp shuffles, and reduction / scan operations Prof. Mike Giles mike.giles@maths.ox.ac.uk Oxford University Mathematical Institute Oxford e-research Centre Lecture 4 p. 1 Warp shuffles Warp

More information

4/13/ Introduction. 1. Introduction. 2. Formulation. 2. Formulation. 2. Formulation

4/13/ Introduction. 1. Introduction. 2. Formulation. 2. Formulation. 2. Formulation 1. Introduction Motivation: Beijing Jiaotong University 1 Lotus Hill Research Institute University of California, Los Angeles 3 CO 3 for Ultra-fast and Accurate Interactive Image Segmentation This paper

More information

Compressed Sensing for Rapid MR Imaging

Compressed Sensing for Rapid MR Imaging Compressed Sensing for Rapid Imaging Michael Lustig1, Juan Santos1, David Donoho2 and John Pauly1 1 Electrical Engineering Department, Stanford University 2 Statistics Department, Stanford University rapid

More information

Efficient Imaging Algorithms on Many-Core Platforms

Efficient Imaging Algorithms on Many-Core Platforms Efficient Imaging Algorithms on Many-Core Platforms H. Köstler Dagstuhl, 22.11.2011 Contents Imaging Applications HDR Compression performance of PDE-based models Image Denoising performance of patch-based

More information

Final Review. Image Processing CSE 166 Lecture 18

Final Review. Image Processing CSE 166 Lecture 18 Final Review Image Processing CSE 166 Lecture 18 Topics covered Basis vectors Matrix based transforms Wavelet transform Image compression Image watermarking Morphological image processing Segmentation

More information

Overview. Videos are everywhere. But can take up large amounts of resources. Exploit redundancy to reduce file size

Overview. Videos are everywhere. But can take up large amounts of resources. Exploit redundancy to reduce file size Overview Videos are everywhere But can take up large amounts of resources Disk space Memory Network bandwidth Exploit redundancy to reduce file size Spatial Temporal General lossless compression Huffman

More information

Parallel and Distributed Sparse Optimization Algorithms

Parallel and Distributed Sparse Optimization Algorithms Parallel and Distributed Sparse Optimization Algorithms Part I Ruoyu Li 1 1 Department of Computer Science and Engineering University of Texas at Arlington March 19, 2015 Ruoyu Li (UTA) Parallel and Distributed

More information

Blind Compressed Sensing Using Sparsifying Transforms

Blind Compressed Sensing Using Sparsifying Transforms Blind Compressed Sensing Using Sparsifying Transforms Saiprasad Ravishankar and Yoram Bresler Department of Electrical and Computer Engineering and Coordinated Science Laboratory University of Illinois

More information

The Benefit of Tree Sparsity in Accelerated MRI

The Benefit of Tree Sparsity in Accelerated MRI The Benefit of Tree Sparsity in Accelerated MRI Chen Chen and Junzhou Huang Department of Computer Science and Engineering, The University of Texas at Arlington, TX, USA 76019 Abstract. The wavelet coefficients

More information

Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters

Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences,

More information

Accelerating Double Precision FEM Simulations with GPUs

Accelerating Double Precision FEM Simulations with GPUs Accelerating Double Precision FEM Simulations with GPUs Dominik Göddeke 1 3 Robert Strzodka 2 Stefan Turek 1 dominik.goeddeke@math.uni-dortmund.de 1 Mathematics III: Applied Mathematics and Numerics, University

More information

Spread Spectrum Using Chirp Modulated RF Pulses for Incoherent Sampling Compressive Sensing MRI

Spread Spectrum Using Chirp Modulated RF Pulses for Incoherent Sampling Compressive Sensing MRI Spread Spectrum Using Chirp Modulated RF Pulses for Incoherent Sampling Compressive Sensing MRI Sulaiman A. AL Hasani Department of ECSE, Monash University, Melbourne, Australia Email: sulaiman.alhasani@monash.edu

More information

smooth coefficients H. Köstler, U. Rüde

smooth coefficients H. Köstler, U. Rüde A robust multigrid solver for the optical flow problem with non- smooth coefficients H. Köstler, U. Rüde Overview Optical Flow Problem Data term and various regularizers A Robust Multigrid Solver Galerkin

More information

Accelerated MRI Techniques: Basics of Parallel Imaging and Compressed Sensing

Accelerated MRI Techniques: Basics of Parallel Imaging and Compressed Sensing Accelerated MRI Techniques: Basics of Parallel Imaging and Compressed Sensing Peng Hu, Ph.D. Associate Professor Department of Radiological Sciences PengHu@mednet.ucla.edu 310-267-6838 MRI... MRI has low

More information

The Alternating Direction Method of Multipliers

The Alternating Direction Method of Multipliers The Alternating Direction Method of Multipliers With Adaptive Step Size Selection Peter Sutor, Jr. Project Advisor: Professor Tom Goldstein October 8, 2015 1 / 30 Introduction Presentation Outline 1 Convex

More information

GREAT PERFORMANCE FOR TINY PROBLEMS: BATCHED PRODUCTS OF SMALL MATRICES. Nikolay Markovskiy Peter Messmer

GREAT PERFORMANCE FOR TINY PROBLEMS: BATCHED PRODUCTS OF SMALL MATRICES. Nikolay Markovskiy Peter Messmer GREAT PERFORMANCE FOR TINY PROBLEMS: BATCHED PRODUCTS OF SMALL MATRICES Nikolay Markovskiy Peter Messmer ABOUT CP2K Atomistic and molecular simulations of solid state From ab initio DFT and Hartree-Fock

More information

Operator Upscaling and Adjoint State Method

Operator Upscaling and Adjoint State Method Operator Upscaling and Adjoint State Method Tetyana Vdovina, William Symes The Rice Inversion Project Rice University vdovina@rice.edu February 0, 009 Motivation Ultimate Goal: Use 3D elastic upscaling

More information

HYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER PROF. BRYANT PROF. KAYVON 15618: PARALLEL COMPUTER ARCHITECTURE

HYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER PROF. BRYANT PROF. KAYVON 15618: PARALLEL COMPUTER ARCHITECTURE HYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER AVISHA DHISLE PRERIT RODNEY ADHISLE PRODNEY 15618: PARALLEL COMPUTER ARCHITECTURE PROF. BRYANT PROF. KAYVON LET S

More information

Institute of Cardiovascular Science, UCL Centre for Cardiovascular Imaging, London, United Kingdom, 2

Institute of Cardiovascular Science, UCL Centre for Cardiovascular Imaging, London, United Kingdom, 2 Grzegorz Tomasz Kowalik 1, Jennifer Anne Steeden 1, Bejal Pandya 1, David Atkinson 2, Andrew Taylor 1, and Vivek Muthurangu 1 1 Institute of Cardiovascular Science, UCL Centre for Cardiovascular Imaging,

More information

Image Compression Algorithm for Different Wavelet Codes

Image Compression Algorithm for Different Wavelet Codes Image Compression Algorithm for Different Wavelet Codes Tanveer Sultana Department of Information Technology Deccan college of Engineering and Technology, Hyderabad, Telangana, India. Abstract: - This

More information

Compressed Sensing for Electron Tomography

Compressed Sensing for Electron Tomography University of Maryland, College Park Department of Mathematics February 10, 2015 1/33 Outline I Introduction 1 Introduction 2 3 4 2/33 1 Introduction 2 3 4 3/33 Tomography Introduction Tomography - Producing

More information

Asynchronous OpenCL/MPI numerical simulations of conservation laws

Asynchronous OpenCL/MPI numerical simulations of conservation laws Asynchronous OpenCL/MPI numerical simulations of conservation laws Philippe HELLUY 1,3, Thomas STRUB 2. 1 IRMA, Université de Strasbourg, 2 AxesSim, 3 Inria Tonus, France IWOCL 2015, Stanford Conservation

More information

Tesla Architecture, CUDA and Optimization Strategies

Tesla Architecture, CUDA and Optimization Strategies Tesla Architecture, CUDA and Optimization Strategies Lan Shi, Li Yi & Liyuan Zhang Hauptseminar: Multicore Architectures and Programming Page 1 Outline Tesla Architecture & CUDA CUDA Programming Optimization

More information

A fast algorithm for sparse reconstruction based on shrinkage, subspace optimization and continuation [Wen,Yin,Goldfarb,Zhang 2009]

A fast algorithm for sparse reconstruction based on shrinkage, subspace optimization and continuation [Wen,Yin,Goldfarb,Zhang 2009] A fast algorithm for sparse reconstruction based on shrinkage, subspace optimization and continuation [Wen,Yin,Goldfarb,Zhang 2009] Yongjia Song University of Wisconsin-Madison April 22, 2010 Yongjia Song

More information

Porting Performance across GPUs and FPGAs

Porting Performance across GPUs and FPGAs Porting Performance across GPUs and FPGAs Deming Chen, ECE, University of Illinois In collaboration with Alex Papakonstantinou 1, Karthik Gururaj 2, John Stratton 1, Jason Cong 2, Wen-Mei Hwu 1 1: ECE

More information

Today. Motivation. Motivation. Image gradient. Image gradient. Computational Photography

Today. Motivation. Motivation. Image gradient. Image gradient. Computational Photography Computational Photography Matthias Zwicker University of Bern Fall 009 Today Gradient domain image manipulation Introduction Gradient cut & paste Tone mapping Color-to-gray conversion Motivation Cut &

More information

High performance 2D Discrete Fourier Transform on Heterogeneous Platforms. Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli

High performance 2D Discrete Fourier Transform on Heterogeneous Platforms. Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli High performance 2D Discrete Fourier Transform on Heterogeneous Platforms Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli Motivation Fourier Transform widely used in Physics, Astronomy, Engineering

More information

Reconstruction Improvements on Compressive Sensing

Reconstruction Improvements on Compressive Sensing SCITECH Volume 6, Issue 2 RESEARCH ORGANISATION November 21, 2017 Journal of Information Sciences and Computing Technologies www.scitecresearch.com/journals Reconstruction Improvements on Compressive Sensing

More information

Learning with infinitely many features

Learning with infinitely many features Learning with infinitely many features R. Flamary, Joint work with A. Rakotomamonjy F. Yger, M. Volpi, M. Dalla Mura, D. Tuia Laboratoire Lagrange, Université de Nice Sophia Antipolis December 2012 Example

More information

Automatic FFT Kernel Generation for CUDA GPUs. Akira Nukada Tokyo Institute of Technology

Automatic FFT Kernel Generation for CUDA GPUs. Akira Nukada Tokyo Institute of Technology Automatic FFT Kernel Generation for CUDA GPUs. Akira Nukada Tokyo Institute of Technology FFT (Fast Fourier Transform) FFT is a fast algorithm to compute DFT (Discrete Fourier Transform). twiddle factors

More information

Development of fast imaging techniques in MRI From the principle to the recent development

Development of fast imaging techniques in MRI From the principle to the recent development 980-8575 2-1 2012 10 13 Development of fast imaging techniques in MRI From the principle to the recent development Yoshio MACHIDA and Issei MORI Health Sciences, Tohoku University Graduate School of Medicine

More information

Parallelization of Shortest Path Graph Kernels on Multi-Core CPUs and GPU

Parallelization of Shortest Path Graph Kernels on Multi-Core CPUs and GPU Parallelization of Shortest Path Graph Kernels on Multi-Core CPUs and GPU Lifan Xu Wei Wang Marco A. Alvarez John Cavazos Dongping Zhang Department of Computer and Information Science University of Delaware

More information

Lecture 5: Error Resilience & Scalability

Lecture 5: Error Resilience & Scalability Lecture 5: Error Resilience & Scalability Dr Reji Mathew A/Prof. Jian Zhang NICTA & CSE UNSW COMP9519 Multimedia Systems S 010 jzhang@cse.unsw.edu.au Outline Error Resilience Scalability Including slides

More information

Denoising and Edge Detection Using Sobelmethod

Denoising and Edge Detection Using Sobelmethod International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Denoising and Edge Detection Using Sobelmethod P. Sravya 1, T. Rupa devi 2, M. Janardhana Rao 3, K. Sai Jagadeesh 4, K. Prasanna

More information

3D ADI Method for Fluid Simulation on Multiple GPUs. Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA

3D ADI Method for Fluid Simulation on Multiple GPUs. Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA 3D ADI Method for Fluid Simulation on Multiple GPUs Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA Introduction Fluid simulation using direct numerical methods Gives the most accurate result Requires

More information

Steen Moeller Center for Magnetic Resonance research University of Minnesota

Steen Moeller Center for Magnetic Resonance research University of Minnesota Steen Moeller Center for Magnetic Resonance research University of Minnesota moeller@cmrr.umn.edu Lot of material is from a talk by Douglas C. Noll Department of Biomedical Engineering Functional MRI Laboratory

More information

Large scale Imaging on Current Many- Core Platforms

Large scale Imaging on Current Many- Core Platforms Large scale Imaging on Current Many- Core Platforms SIAM Conf. on Imaging Science 2012 May 20, 2012 Dr. Harald Köstler Chair for System Simulation Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen,

More information

Total Variation Regularization Method for 3D Rotational Coronary Angiography

Total Variation Regularization Method for 3D Rotational Coronary Angiography Total Variation Regularization Method for 3D Rotational Coronary Angiography Haibo Wu 1,2, Christopher Rohkohl 1,3, Joachim Hornegger 1,2 1 Pattern Recognition Lab (LME), Department of Computer Science,

More information

HPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)

HPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances) HPC and IT Issues Session Agenda Deployment of Simulation (Trends and Issues Impacting IT) Discussion Mapping HPC to Performance (Scaling, Technology Advances) Discussion Optimizing IT for Remote Access

More information

GPU-Based Acceleration for CT Image Reconstruction

GPU-Based Acceleration for CT Image Reconstruction GPU-Based Acceleration for CT Image Reconstruction Xiaodong Yu Advisor: Wu-chun Feng Collaborators: Guohua Cao, Hao Gong Outline Introduction and Motivation Background Knowledge Challenges and Proposed

More information

Sparse signal separation and imaging in Synthetic Aperture Radar

Sparse signal separation and imaging in Synthetic Aperture Radar Sparse signal separation and imaging in Synthetic Aperture Radar Mike Davies University of Edinburgh Joint work with Shaun Kelly, Bernie Mulgrew, Mehrdad Yaghoobi, and Di Wu CoSeRa, September 2016 mod

More information

Accelerating image registration on GPUs

Accelerating image registration on GPUs Accelerating image registration on GPUs Harald Köstler, Sunil Ramgopal Tatavarty SIAM Conference on Imaging Science (IS10) 13.4.2010 Contents Motivation: Image registration with FAIR GPU Programming Combining

More information

Redundancy Encoding for Fast Dynamic MR Imaging using Structured Sparsity

Redundancy Encoding for Fast Dynamic MR Imaging using Structured Sparsity Redundancy Encoding for Fast Dynamic MR Imaging using Structured Sparsity Vimal Singh and Ahmed H. Tewfik Electrical and Computer Engineering Dept., The University of Texas at Austin, USA Abstract. For

More information

Compressed Sensing and L 1 -Related Minimization

Compressed Sensing and L 1 -Related Minimization Compressed Sensing and L 1 -Related Minimization Yin Wotao Computational and Applied Mathematics Rice University Jan 4, 2008 Chinese Academy of Sciences Inst. Comp. Math The Problems of Interest Unconstrained

More information

Parallel constraint optimization algorithms for higher-order discrete graphical models: applications in hyperspectral imaging

Parallel constraint optimization algorithms for higher-order discrete graphical models: applications in hyperspectral imaging Parallel constraint optimization algorithms for higher-order discrete graphical models: applications in hyperspectral imaging MSc Billy Braithwaite Supervisors: Prof. Pekka Neittaanmäki and Phd Ilkka Pölönen

More information

Preparing seismic codes for GPUs and other

Preparing seismic codes for GPUs and other Preparing seismic codes for GPUs and other many-core architectures Paulius Micikevicius Developer Technology Engineer, NVIDIA 2010 SEG Post-convention Workshop (W-3) High Performance Implementations of

More information

Scalable Multi Agent Simulation on the GPU. Avi Bleiweiss NVIDIA Corporation San Jose, 2009

Scalable Multi Agent Simulation on the GPU. Avi Bleiweiss NVIDIA Corporation San Jose, 2009 Scalable Multi Agent Simulation on the GPU Avi Bleiweiss NVIDIA Corporation San Jose, 2009 Reasoning Explicit State machine, serial Implicit Compute intensive Fits SIMT well Collision avoidance Motivation

More information

Lecture 15: More Iterative Ideas

Lecture 15: More Iterative Ideas Lecture 15: More Iterative Ideas David Bindel 15 Mar 2010 Logistics HW 2 due! Some notes on HW 2. Where we are / where we re going More iterative ideas. Intro to HW 3. More HW 2 notes See solution code!

More information

General-purpose computing on graphics processing units (GPGPU)

General-purpose computing on graphics processing units (GPGPU) General-purpose computing on graphics processing units (GPGPU) Thomas Ægidiussen Jensen Henrik Anker Rasmussen François Rosé November 1, 2010 Table of Contents Introduction CUDA CUDA Programming Kernels

More information

CUDA. Schedule API. Language extensions. nvcc. Function type qualifiers (1) CUDA compiler to handle the standard C extensions.

CUDA. Schedule API. Language extensions. nvcc. Function type qualifiers (1) CUDA compiler to handle the standard C extensions. Schedule CUDA Digging further into the programming manual Application Programming Interface (API) text only part, sorry Image utilities (simple CUDA examples) Performace considerations Matrix multiplication

More information

ME964 High Performance Computing for Engineering Applications

ME964 High Performance Computing for Engineering Applications ME964 High Performance Computing for Engineering Applications Outlining Midterm Projects Topic 3: GPU-based FEA Topic 4: GPU Direct Solver for Sparse Linear Algebra March 01, 2011 Dan Negrut, 2011 ME964

More information

3. Lifting Scheme of Wavelet Transform

3. Lifting Scheme of Wavelet Transform 3. Lifting Scheme of Wavelet Transform 3. Introduction The Wim Sweldens 76 developed the lifting scheme for the construction of biorthogonal wavelets. The main feature of the lifting scheme is that all

More information

Compressive Sensing Based Image Reconstruction using Wavelet Transform

Compressive Sensing Based Image Reconstruction using Wavelet Transform Compressive Sensing Based Image Reconstruction using Wavelet Transform Sherin C Abraham #1, Ketki Pathak *2, Jigna J Patel #3 # Electronics & Communication department, Gujarat Technological University

More information

Shape Optimization for Consumer-Level 3D Printing

Shape Optimization for Consumer-Level 3D Printing hape Optimization for Consumer-Level 3D Printing Przemyslaw Musialski TU Wien Motivation 3D Modeling 3D Printing Przemyslaw Musialski 2 Motivation Przemyslaw Musialski 3 Przemyslaw Musialski 4 Example

More information

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can

More information

Yunyun Yang, Chunming Li, Chiu-Yen Kao and Stanley Osher. Speaker: Chiu-Yen Kao (Math Department, The Ohio State University) BIRS, Banff, Canada

Yunyun Yang, Chunming Li, Chiu-Yen Kao and Stanley Osher. Speaker: Chiu-Yen Kao (Math Department, The Ohio State University) BIRS, Banff, Canada Yunyun Yang, Chunming Li, Chiu-Yen Kao and Stanley Osher Speaker: Chiu-Yen Kao (Math Department, The Ohio State University) BIRS, Banff, Canada Outline Review of Region-based Active Contour Models Mumford

More information

J. Blair Perot. Ali Khajeh-Saeed. Software Engineer CD-adapco. Mechanical Engineering UMASS, Amherst

J. Blair Perot. Ali Khajeh-Saeed. Software Engineer CD-adapco. Mechanical Engineering UMASS, Amherst Ali Khajeh-Saeed Software Engineer CD-adapco J. Blair Perot Mechanical Engineering UMASS, Amherst Supercomputers Optimization Stream Benchmark Stag++ (3D Incompressible Flow Code) Matrix Multiply Function

More information

GPUs Open New Avenues in Medical MRI

GPUs Open New Avenues in Medical MRI GPUs Open New Avenues in Medical MRI Chris A. Cocosco D. Gallichan, F. Testud, M. Zaitsev, and J. Hennig Dept. of Radiology, Medical Physics, UNIVERSITY MEDICAL CENTER FREIBURG 1 Our research group: Biomedical

More information

Parallel FFT Program Optimizations on Heterogeneous Computers

Parallel FFT Program Optimizations on Heterogeneous Computers Parallel FFT Program Optimizations on Heterogeneous Computers Shuo Chen, Xiaoming Li Department of Electrical and Computer Engineering University of Delaware, Newark, DE 19716 Outline Part I: A Hybrid

More information

Introducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method

Introducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method Introducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method G. Wellein, T. Zeiser, G. Hager HPC Services Regional Computing Center A. Nitsure, K. Iglberger, U. Rüde Chair for System

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms http://sudalab.is.s.u-tokyo.ac.jp/~reiji/pna14/ [ 10 ] GPU and CUDA Parallel Numerical Algorithms / IST / UTokyo 1 PNA16 Lecture Plan General Topics 1. Architecture and Performance

More information

ALYA Multi-Physics System on GPUs: Offloading Large-Scale Computational Mechanics Problems

ALYA Multi-Physics System on GPUs: Offloading Large-Scale Computational Mechanics Problems www.bsc.es ALYA Multi-Physics System on GPUs: Offloading Large-Scale Computational Mechanics Problems Vishal Mehta Engineer, Barcelona Supercomputing Center vishal.mehta@bsc.es Training BSC/UPC GPU Centre

More information

Bilevel Sparse Coding

Bilevel Sparse Coding Adobe Research 345 Park Ave, San Jose, CA Mar 15, 2013 Outline 1 2 The learning model The learning algorithm 3 4 Sparse Modeling Many types of sensory data, e.g., images and audio, are in high-dimensional

More information

Outline. Single GPU Implementation. Multi-GPU Implementation. 2-pass and 1-pass approaches Performance evaluation. Scalability on clusters

Outline. Single GPU Implementation. Multi-GPU Implementation. 2-pass and 1-pass approaches Performance evaluation. Scalability on clusters Implementing 3D Finite Difference Codes on the GPU Paulius Micikevicius NVIDIA Outline Single GPU Implementation 2-pass and 1-pass approaches Performance evaluation Multi-GPU Implementation Scalability

More information

Compressive Sensing MRI with Wavelet Tree Sparsity

Compressive Sensing MRI with Wavelet Tree Sparsity Compressive Sensing MRI with Wavelet Tree Sparsity Chen Chen and Junzhou Huang Department of Computer Science and Engineering University of Texas at Arlington cchen@mavs.uta.edu jzhuang@uta.edu Abstract

More information

Warps and Reduction Algorithms

Warps and Reduction Algorithms Warps and Reduction Algorithms 1 more on Thread Execution block partitioning into warps single-instruction, multiple-thread, and divergence 2 Parallel Reduction Algorithms computing the sum or the maximum

More information

GTC 2017 S7672. OpenACC Best Practices: Accelerating the C++ NUMECA FINE/Open CFD Solver

GTC 2017 S7672. OpenACC Best Practices: Accelerating the C++ NUMECA FINE/Open CFD Solver David Gutzwiller, NUMECA USA (david.gutzwiller@numeca.com) Dr. Ravi Srinivasan, Dresser-Rand Alain Demeulenaere, NUMECA USA 5/9/2017 GTC 2017 S7672 OpenACC Best Practices: Accelerating the C++ NUMECA FINE/Open

More information

Gradient Free Design of Microfluidic Structures on a GPU Cluster

Gradient Free Design of Microfluidic Structures on a GPU Cluster Gradient Free Design of Microfluidic Structures on a GPU Cluster Austen Duffy - Florida State University SIAM Conference on Computational Science and Engineering March 2, 2011 Acknowledgements This work

More information

Module Memory and Data Locality

Module Memory and Data Locality GPU Teaching Kit Accelerated Computing Module 4.5 - Memory and Data Locality Handling Arbitrary Matrix Sizes in Tiled Algorithms Objective To learn to handle arbitrary matrix sizes in tiled matrix multiplication

More information

Total Variation Regularization Method for 3-D Rotational Coronary Angiography

Total Variation Regularization Method for 3-D Rotational Coronary Angiography Total Variation Regularization Method for 3-D Rotational Coronary Angiography Haibo Wu 1,2, Christopher Rohkohl 1,3, Joachim Hornegger 1,2 1 Pattern Recognition Lab (LME), Department of Computer Science,

More information

High dynamic range magnetic resonance flow imaging in the abdomen

High dynamic range magnetic resonance flow imaging in the abdomen High dynamic range magnetic resonance flow imaging in the abdomen Christopher M. Sandino EE 367 Project Proposal 1 Motivation Time-resolved, volumetric phase-contrast magnetic resonance imaging (also known

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences

More information

Highly Efficient Compensationbased Parallelism for Wavefront Loops on GPUs

Highly Efficient Compensationbased Parallelism for Wavefront Loops on GPUs Highly Efficient Compensationbased Parallelism for Wavefront Loops on GPUs Kaixi Hou, Hao Wang, Wu chun Feng {kaixihou, hwang121, wfeng}@vt.edu Jeffrey S. Vetter, Seyong Lee vetter@computer.org, lees2@ornl.gov

More information

Limited view X-ray CT for dimensional analysis

Limited view X-ray CT for dimensional analysis Limited view X-ray CT for dimensional analysis G. A. JONES ( GLENN.JONES@IMPERIAL.AC.UK ) P. HUTHWAITE ( P.HUTHWAITE@IMPERIAL.AC.UK ) NON-DESTRUCTIVE EVALUATION GROUP 1 Outline of talk Industrial X-ray

More information

Image Filtering, Warping and Sampling

Image Filtering, Warping and Sampling Image Filtering, Warping and Sampling Connelly Barnes CS 4810 University of Virginia Acknowledgement: slides by Jason Lawrence, Misha Kazhdan, Allison Klein, Tom Funkhouser, Adam Finkelstein and David

More information

ACCELERATING THE PRODUCTION OF SYNTHETIC SEISMOGRAMS BY A MULTICORE PROCESSOR CLUSTER WITH MULTIPLE GPUS

ACCELERATING THE PRODUCTION OF SYNTHETIC SEISMOGRAMS BY A MULTICORE PROCESSOR CLUSTER WITH MULTIPLE GPUS ACCELERATING THE PRODUCTION OF SYNTHETIC SEISMOGRAMS BY A MULTICORE PROCESSOR CLUSTER WITH MULTIPLE GPUS Ferdinando Alessi Annalisa Massini Roberto Basili INGV Introduction The simulation of wave propagation

More information