From Image to Video: Real-time Medical Imaging with MRI

Size: px
Start display at page:

Download "From Image to Video: Real-time Medical Imaging with MRI"

Transcription

1 From Image to Video: Real-time Medical Imaging with MRI Sebastian Schaetz, Martin Uecker BiomedNMR Forschungs GmbH at the MPI for biophysical Chemistry, Goettingen, Germany Electrical Engineering and Computer Sciences, University of California, Berkeley GTC 2013 (S3236) March 20th, 2013

2 About Us Computer scientist and physicist, working on Magnetic Resonance Imaging (MRI) in close collaboration with clinicians

3 Magnetic Resonance Imaging (MRI) Noninvasive imaging of living organisms Visualization of structure and function Image slices of arbitrary location and orientation No ionizing radiation Applications: Radiology, clinical research, and neuro science Limitations: Strong magnetic eld (contraindication e.g. pacemaker) Long measurement times

4 Real-time MRI Real-time imaging Imaging of dynamic processes Adequate spatial and temporal resolution No cardiac or respiratory gating, no stop motion Real-time processing Fast and low-latency reconstruction Requirement: bounded processing delay

5 Principles of MR Imaging

6 Principles of MR Imaging

7 Principles of MR Imaging

8 Principles of MR Imaging

9 Principles of MR Imaging

10 Principles of MR Imaging

11 Principles of MR Imaging

12 Principles of MR Imaging

13 Principles of MR Imaging

14 Principles of MR Imaging

15 Principles of MR Imaging

16 Principles of MR Imaging

17 Principles of MR Imaging

18 Principles of MR Imaging

19 Principles of MR Imaging

20 Principles of MR Imaging

21 Principles of MR Imaging

22 Principles of MR Imaging

23 Principles of MR Imaging?

24 Image Reconstruction as Inverse Problem Forward problem: y = Fx + n y data, F (nonlinear) operator, x image (and more), n noise Regularized solution: Advantages: x = argmin x Fx y 2 2 }{{} data consistency Modelling of physical eects (coil sensitivities) + αr(x) }{{} regularization Prior knowledge via suitable regularization terms

25 Nonlinear Inverse Reconstruction Inverse problem: Unknown image ρ and coil sensitivities c j Nonlinear equation Fx = y Reconstruction: Iteratively regularized Gauss-Newton method (IRGNM) x n+1 x n = argmin δx DF H (x n )δx + F (x n ) y α n δx + x n 2 2 Smoothness penalty for the coil sensitivities Uecker et al., Magn Reson Med 60:674682, 2008

26 Challenge: Real-time Reconstruction Measurements with up to 50 fps Images must be available immediately Iterative algorithm Fixed problem size: "strong scaling" Size and cost of computing system relevant Zhang S et al, JMRI 2012;35:

27 Multi-GPU System Memory b CPU GPU GPU GPU GPU c a CPU d GPU GPU GPU Memory 4HE TYAN FT72B7015 a: Host 2 x 6 core Intel Xeon, 96 GB b: Host 8 x GeForce GTX 580 c: Device 12,7 TFLOPS and 1,53 TB/s d: Device Host Device GPU 19 GB/s 6 GB/s Device 2,5 GB/s Device 6 GB/s

28 Parallelization of Algorithm: Data Parallelism irgn cg

29 Parallelization of Algorithm: Data Parallelism irgn cg

30 Parallelization of Algorithm: Data Parallelism Communication n n g

31 Results

32 Results - Details

33 Results - Details

34 P2P Transfer Optimization naive 1.28 ms 4 GPUs, matrix 384x384

35 P2P Transfer Optimization naive 2D 1.28 ms 4 GPUs, matrix 384x ms

36 P2P Transfer Optimization naive 2D optimized sync 1.28 ms 0.43 ms 0.37 ms 4 GPUs, matrix 384x384 Speedup = 3.5

37 Generalization: A multi-gpu programming library C++ Library based on Boost Identical code for 1...N GPUs Full control over application Minimal overhead Full access to performance-relevant hardware features Integration of established algorithms and libraries CUDA backend (OpenCL backend in preparation)

38 Concept: Segmented Container data segmented vector pointer0 pointer1 pointer2 size0 size1 size2 GPU0 GPU1 GPU2

39 Segmented Container Usage 1 // coil profile size (e.g. 512 * 512 * 10) 2 std::size_t size = dim_x * dim_y * coil_profiles; 3 4 // create runtime environment, use 4 GPUs 5 environment e(create_dev_group(0, 4)); 6 7 // create vector in CPU main memory to store coil profiles 8 std::vector<cfloat> Coils_host(size); 9 10 // allocate memory for coil profiles across all 4 GPUs 11 seg_dev_vector<cfloat> Coils_device(size, dim_x * dim_y); // copy from host to device 14 copy(coils_host, Coils_device.begin()); // copy back from device to host 17 copy(coils_device, Coils_host.begin());

40 Intra-System Communication copy copy broadcast GPU0 GPU0 GPU0 GPU1 GPU1 GPU1 CPU/GPU CPU/GPU GPU2 GPU2 CPU/GPU GPU2 scatter/gather reduce all-reduce GPU0 GPU0 GPU0 GPU0 CPU/ GPU GPU1 GPU1 GPU1 GPU1 GPU2 CPU/GPU GPU2 GPU2 GPU2

41 Interface and Integration All memory transfers (except reduce) through one function: copy(source_range, destination_iterator); Interoperability with other libraries and frameworks such as Boost ublas, Matlab: 1 #include <mgpu/container/dev_vector.hpp> 2 #include <mgpu/transfer/copy.hpp> 3 void mexfunction(int nlhs, mxarray * plhs[], int nrhs, const mxarray *prhs[]) 4 { 5 double const * i_real = mxgetpr(prhs[0]); 6 mgpu::dev_vector<double> dev(size); 7 mgpu::copy(mgpu::make_range(i_real, i_real+size), dev.begin()); 8 plhs[0] = mxcreatenumericmatrix(size, 1, mxdouble_class, mxcomplex); 9 double * o_imag = mxgetpi(plhs[0]); 10 mgpu::copy(dev, o_imag); 11 }

42 MGPU Library Available here (alpha version) Schätz, Sebastian and Uecker, Martin. "A Multi-GPU Programming Library for Real-Time Applications." Algorithms and Architectures for Parallel Processing (2012):

43 Parallelization of Algorithm: Task Parallelism Problem: initial value and temporal continuity

44 Parallelization of Algorithm: Task Parallelism Problem: initial value and temporal continuity GPU GPU 0 GPU t in s

45 Beating Heart Example

46 Singing Example

47 From Image to Video: Real-time Medical Imaging with MRI

EE290T: Advanced Reconstruction Methods for Magnetic Resonance Imaging. Martin Uecker

EE290T: Advanced Reconstruction Methods for Magnetic Resonance Imaging. Martin Uecker EE290T: Advanced Reconstruction Methods for Magnetic Resonance Imaging Martin Uecker Tentative Syllabus 01: Jan 27 Introduction 02: Feb 03 Parallel Imaging as Inverse Problem 03: Feb 10 Iterative Reconstruction

More information

Large scale Imaging on Current Many- Core Platforms

Large scale Imaging on Current Many- Core Platforms Large scale Imaging on Current Many- Core Platforms SIAM Conf. on Imaging Science 2012 May 20, 2012 Dr. Harald Köstler Chair for System Simulation Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen,

More information

arxiv: v1 [cs.dc] 7 Jan 2013

arxiv: v1 [cs.dc] 7 Jan 2013 A multi-gpu Programming Library for Real-Time Applications Sebastian Schaetz 1 and Martin Uecker 2 arxiv:1301.1215v1 [cs.dc] 7 Jan 2013 1 BiomedNMR Forschungs GmbH at the Max Planck Institute for biophysical

More information

Functional MRI in Clinical Research and Practice Preprocessing

Functional MRI in Clinical Research and Practice Preprocessing Functional MRI in Clinical Research and Practice Preprocessing fmri Preprocessing Slice timing correction Geometric distortion correction Head motion correction Temporal filtering Intensity normalization

More information

Multigrid algorithms on multi-gpu architectures

Multigrid algorithms on multi-gpu architectures Multigrid algorithms on multi-gpu architectures H. Köstler European Multi-Grid Conference EMG 2010 Isola d Ischia, Italy 20.9.2010 2 Contents Work @ LSS GPU Architectures and Programming Paradigms Applications

More information

SPM8 for Basic and Clinical Investigators. Preprocessing

SPM8 for Basic and Clinical Investigators. Preprocessing SPM8 for Basic and Clinical Investigators Preprocessing fmri Preprocessing Slice timing correction Geometric distortion correction Head motion correction Temporal filtering Intensity normalization Spatial

More information

SPM8 for Basic and Clinical Investigators. Preprocessing. fmri Preprocessing

SPM8 for Basic and Clinical Investigators. Preprocessing. fmri Preprocessing SPM8 for Basic and Clinical Investigators Preprocessing fmri Preprocessing Slice timing correction Geometric distortion correction Head motion correction Temporal filtering Intensity normalization Spatial

More information

Face Recognition. Programming Project. Haofu Liao, BSEE. Department of Electrical and Computer Engineering. Northeastern University.

Face Recognition. Programming Project. Haofu Liao, BSEE. Department of Electrical and Computer Engineering. Northeastern University. Face Recognition Programming Project Haofu Liao, BSEE June 23, 2013 Department of Electrical and Computer Engineering Northeastern University 1. How to build the PCA Mex Funtion 1.1 Basic Information The

More information

Institute of Cardiovascular Science, UCL Centre for Cardiovascular Imaging, London, United Kingdom, 2

Institute of Cardiovascular Science, UCL Centre for Cardiovascular Imaging, London, United Kingdom, 2 Grzegorz Tomasz Kowalik 1, Jennifer Anne Steeden 1, Bejal Pandya 1, David Atkinson 2, Andrew Taylor 1, and Vivek Muthurangu 1 1 Institute of Cardiovascular Science, UCL Centre for Cardiovascular Imaging,

More information

Implementation of Parma Polyhedron Library -functions in MATLAB

Implementation of Parma Polyhedron Library -functions in MATLAB Implementation of Parma Polyhedron Library -functions in MATLAB Leonhard Asselborn Electrical and Computer Engineering Carnegie Mellon University Group meeting Oct. 21 st 2010 Overview Introduction Motivation

More information

EPI Data Are Acquired Serially. EPI Data Are Acquired Serially 10/23/2011. Functional Connectivity Preprocessing. fmri Preprocessing

EPI Data Are Acquired Serially. EPI Data Are Acquired Serially 10/23/2011. Functional Connectivity Preprocessing. fmri Preprocessing Functional Connectivity Preprocessing Geometric distortion Head motion Geometric distortion Head motion EPI Data Are Acquired Serially EPI Data Are Acquired Serially descending 1 EPI Data Are Acquired

More information

GPUs Open New Avenues in Medical MRI

GPUs Open New Avenues in Medical MRI GPUs Open New Avenues in Medical MRI Chris A. Cocosco D. Gallichan, F. Testud, M. Zaitsev, and J. Hennig Dept. of Radiology, Medical Physics, UNIVERSITY MEDICAL CENTER FREIBURG 1 Our research group: Biomedical

More information

Basic fmri Design and Analysis. Preprocessing

Basic fmri Design and Analysis. Preprocessing Basic fmri Design and Analysis Preprocessing fmri Preprocessing Slice timing correction Geometric distortion correction Head motion correction Temporal filtering Intensity normalization Spatial filtering

More information

Role of Parallel Imaging in High Field Functional MRI

Role of Parallel Imaging in High Field Functional MRI Role of Parallel Imaging in High Field Functional MRI Douglas C. Noll & Bradley P. Sutton Department of Biomedical Engineering, University of Michigan Supported by NIH Grant DA15410 & The Whitaker Foundation

More information

Exploiting GPU Caches in Sparse Matrix Vector Multiplication. Yusuke Nagasaka Tokyo Institute of Technology

Exploiting GPU Caches in Sparse Matrix Vector Multiplication. Yusuke Nagasaka Tokyo Institute of Technology Exploiting GPU Caches in Sparse Matrix Vector Multiplication Yusuke Nagasaka Tokyo Institute of Technology Sparse Matrix Generated by FEM, being as the graph data Often require solving sparse linear equation

More information

A System for Interfacing MATLAB with External Software Geared Toward Automatic Differentiation

A System for Interfacing MATLAB with External Software Geared Toward Automatic Differentiation A System for Interfacing MATLAB with External Software Geared Toward Automatic Differentiation 02. Sept. 2006 - ICMS 2006 - Castro-Urdiales H. Martin Bücker, RWTH Aachen University, Institute for Scientific

More information

Numerical Algorithms on Multi-GPU Architectures

Numerical Algorithms on Multi-GPU Architectures Numerical Algorithms on Multi-GPU Architectures Dr.-Ing. Harald Köstler 2 nd International Workshops on Advances in Computational Mechanics Yokohama, Japan 30.3.2010 2 3 Contents Motivation: Applications

More information

NIH Public Access Author Manuscript Med Phys. Author manuscript; available in PMC 2009 March 13.

NIH Public Access Author Manuscript Med Phys. Author manuscript; available in PMC 2009 March 13. NIH Public Access Author Manuscript Published in final edited form as: Med Phys. 2008 February ; 35(2): 660 663. Prior image constrained compressed sensing (PICCS): A method to accurately reconstruct dynamic

More information

Accelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors

Accelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors Accelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors Michael Boyer, David Tarjan, Scott T. Acton, and Kevin Skadron University of Virginia IPDPS 2009 Outline Leukocyte

More information

HYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER PROF. BRYANT PROF. KAYVON 15618: PARALLEL COMPUTER ARCHITECTURE

HYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER PROF. BRYANT PROF. KAYVON 15618: PARALLEL COMPUTER ARCHITECTURE HYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER AVISHA DHISLE PRERIT RODNEY ADHISLE PRODNEY 15618: PARALLEL COMPUTER ARCHITECTURE PROF. BRYANT PROF. KAYVON LET S

More information

USING LAPACK SOLVERS FOR STRUCTURED MATRICES WITHIN MATLAB

USING LAPACK SOLVERS FOR STRUCTURED MATRICES WITHIN MATLAB USING LAPACK SOLVERS FOR STRUCTURED MATRICES WITHIN MATLAB Radek Frízel*, Martin Hromčík**, Zdeněk Hurák***, Michael Šebek*** *Department of Control Engineering, Faculty of Electrical Engineering, Czech

More information

Supplemental Material for Efficient MR Image Reconstruction for Compressed MR Imaging

Supplemental Material for Efficient MR Image Reconstruction for Compressed MR Imaging Supplemental Material for Efficient MR Image Reconstruction for Compressed MR Imaging Paper ID: 1195 No Institute Given 1 More Experiment Results 1.1 Visual Comparisons We apply all methods on four 2D

More information

Enhao Gong, PhD Candidate, Electrical Engineering, Stanford University Dr. John Pauly, Professor in Electrical Engineering, Stanford University Dr.

Enhao Gong, PhD Candidate, Electrical Engineering, Stanford University Dr. John Pauly, Professor in Electrical Engineering, Stanford University Dr. Enhao Gong, PhD Candidate, Electrical Engineering, Stanford University Dr. John Pauly, Professor in Electrical Engineering, Stanford University Dr. Greg Zaharchuk, Associate Professor in Radiology, Stanford

More information

design as a constrained maximization problem. In principle, CODE seeks to maximize the b-value, defined as, where

design as a constrained maximization problem. In principle, CODE seeks to maximize the b-value, defined as, where Optimal design of motion-compensated diffusion gradient waveforms Óscar Peña-Nogales 1, Rodrigo de Luis-Garcia 1, Santiago Aja-Fernández 1,Yuxin Zhang 2,3, James H. Holmes 2,Diego Hernando 2,3 1 Laboratorio

More information

Efficient AMG on Hybrid GPU Clusters. ScicomP Jiri Kraus, Malte Förster, Thomas Brandes, Thomas Soddemann. Fraunhofer SCAI

Efficient AMG on Hybrid GPU Clusters. ScicomP Jiri Kraus, Malte Förster, Thomas Brandes, Thomas Soddemann. Fraunhofer SCAI Efficient AMG on Hybrid GPU Clusters ScicomP 2012 Jiri Kraus, Malte Förster, Thomas Brandes, Thomas Soddemann Fraunhofer SCAI Illustration: Darin McInnis Motivation Sparse iterative solvers benefit from

More information

Applications of Berkeley s Dwarfs on Nvidia GPUs

Applications of Berkeley s Dwarfs on Nvidia GPUs Applications of Berkeley s Dwarfs on Nvidia GPUs Seminar: Topics in High-Performance and Scientific Computing Team N2: Yang Zhang, Haiqing Wang 05.02.2015 Overview CUDA The Dwarfs Dynamic Programming Sparse

More information

3D ADI Method for Fluid Simulation on Multiple GPUs. Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA

3D ADI Method for Fluid Simulation on Multiple GPUs. Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA 3D ADI Method for Fluid Simulation on Multiple GPUs Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA Introduction Fluid simulation using direct numerical methods Gives the most accurate result Requires

More information

Respiratory Motion Compensation for Simultaneous PET/MR Based on Strongly Undersampled Radial MR Data

Respiratory Motion Compensation for Simultaneous PET/MR Based on Strongly Undersampled Radial MR Data Respiratory Motion Compensation for Simultaneous PET/MR Based on Strongly Undersampled Radial MR Data Christopher M Rank 1, Thorsten Heußer 1, Andreas Wetscherek 1, and Marc Kachelrieß 1 1 German Cancer

More information

Sparse sampling in MRI: From basic theory to clinical application. R. Marc Lebel, PhD Department of Electrical Engineering Department of Radiology

Sparse sampling in MRI: From basic theory to clinical application. R. Marc Lebel, PhD Department of Electrical Engineering Department of Radiology Sparse sampling in MRI: From basic theory to clinical application R. Marc Lebel, PhD Department of Electrical Engineering Department of Radiology Objective Provide an intuitive overview of compressed sensing

More information

Georgia Institute of Technology, August 17, Justin W. L. Wan. Canada Research Chair in Scientific Computing

Georgia Institute of Technology, August 17, Justin W. L. Wan. Canada Research Chair in Scientific Computing Real-Time Rigid id 2D-3D Medical Image Registration ti Using RapidMind Multi-Core Platform Georgia Tech/AFRL Workshop on Computational Science Challenge Using Emerging & Massively Parallel Computer Architectures

More information

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand

More information

How to Optimize Geometric Multigrid Methods on GPUs

How to Optimize Geometric Multigrid Methods on GPUs How to Optimize Geometric Multigrid Methods on GPUs Markus Stürmer, Harald Köstler, Ulrich Rüde System Simulation Group University Erlangen March 31st 2011 at Copper Schedule motivation imaging in gradient

More information

Dynamic Autocalibrated Parallel Imaging Using Temporal GRAPPA (TGRAPPA)

Dynamic Autocalibrated Parallel Imaging Using Temporal GRAPPA (TGRAPPA) Magnetic Resonance in Medicine 53:981 985 (2005) Dynamic Autocalibrated Parallel Imaging Using Temporal GRAPPA (TGRAPPA) Felix A. Breuer, 1 * Peter Kellman, 2 Mark A. Griswold, 1 and Peter M. Jakob 1 Current

More information

Clinical Importance. Aortic Stenosis. Aortic Regurgitation. Ultrasound vs. MRI. Carotid Artery Stenosis

Clinical Importance. Aortic Stenosis. Aortic Regurgitation. Ultrasound vs. MRI. Carotid Artery Stenosis Clinical Importance Rapid cardiovascular flow quantitation using sliceselective Fourier velocity encoding with spiral readouts Valve disease affects 10% of patients with heart disease in the U.S. Most

More information

GPU Accelerated Solvers for ODEs Describing Cardiac Membrane Equations

GPU Accelerated Solvers for ODEs Describing Cardiac Membrane Equations GPU Accelerated Solvers for ODEs Describing Cardiac Membrane Equations Fred Lionetti @ CSE Andrew McCulloch @ Bioeng Scott Baden @ CSE University of California, San Diego What is heart modeling? Bioengineer

More information

Accelerating image registration on GPUs

Accelerating image registration on GPUs Accelerating image registration on GPUs Harald Köstler, Sunil Ramgopal Tatavarty SIAM Conference on Imaging Science (IS10) 13.4.2010 Contents Motivation: Image registration with FAIR GPU Programming Combining

More information

White Pixel Artifact. Caused by a noise spike during acquisition Spike in K-space <--> sinusoid in image space

White Pixel Artifact. Caused by a noise spike during acquisition Spike in K-space <--> sinusoid in image space White Pixel Artifact Caused by a noise spike during acquisition Spike in K-space sinusoid in image space Susceptibility Artifacts Off-resonance artifacts caused by adjacent regions with different

More information

MAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures

MAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures MAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures Stan Tomov Innovative Computing Laboratory University of Tennessee, Knoxville OLCF Seminar Series, ORNL June 16, 2010

More information

GADGETRON. Michael S. Hansen, PhD. Magnetic Resonance Technology Program National Institutes of Health - NHLBI

GADGETRON. Michael S. Hansen, PhD. Magnetic Resonance Technology Program National Institutes of Health - NHLBI GADGETRON Michael S. Hansen, PhD Magnetic Resonance Technology Program National Institutes of Health - NHLBI QUESTIONS/COMMENTS EMAIL: michael.hansen@nih.gov Twi8er: @ReconstructThis Outline Gadgetron

More information

Parallel Systems. Project topics

Parallel Systems. Project topics Parallel Systems Project topics 2016-2017 1. Scheduling Scheduling is a common problem which however is NP-complete, so that we are never sure about the optimality of the solution. Parallelisation is a

More information

Single Breath-hold Abdominal T 1 Mapping using 3-D Cartesian Sampling and Spatiotemporally Constrained Reconstruction

Single Breath-hold Abdominal T 1 Mapping using 3-D Cartesian Sampling and Spatiotemporally Constrained Reconstruction Single Breath-hold Abdominal T 1 Mapping using 3-D Cartesian Sampling and Spatiotemporally Constrained Reconstruction Felix Lugauer 1,3, Jens Wetzl 1, Christoph Forman 2, Manuel Schneider 1, Berthold Kiefer

More information

Respiratory Motion Estimation using a 3D Diaphragm Model

Respiratory Motion Estimation using a 3D Diaphragm Model Respiratory Motion Estimation using a 3D Diaphragm Model Marco Bögel 1,2, Christian Riess 1,2, Andreas Maier 1, Joachim Hornegger 1, Rebecca Fahrig 2 1 Pattern Recognition Lab, FAU Erlangen-Nürnberg 2

More information

The Near Future in Cardiac CT Image Reconstruction

The Near Future in Cardiac CT Image Reconstruction SCCT 2010 The Near Future in Cardiac CT Image Reconstruction Marc Kachelrieß Institute of Medical Physics (IMP) Friedrich-Alexander Alexander-University Erlangen-Nürnberg rnberg www.imp.uni-erlangen.de

More information

S WHAT THE PROFILER IS TELLING YOU: OPTIMIZING GPU KERNELS. Jakob Progsch, Mathias Wagner GTC 2018

S WHAT THE PROFILER IS TELLING YOU: OPTIMIZING GPU KERNELS. Jakob Progsch, Mathias Wagner GTC 2018 S8630 - WHAT THE PROFILER IS TELLING YOU: OPTIMIZING GPU KERNELS Jakob Progsch, Mathias Wagner GTC 2018 1. Know your hardware BEFORE YOU START What are the target machines, how many nodes? Machine-specific

More information

Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging

Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging 1 CS 9 Final Project Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging Feiyu Chen Department of Electrical Engineering ABSTRACT Subject motion is a significant

More information

Compressed Sensing Reconstructions for Dynamic Contrast Enhanced MRI

Compressed Sensing Reconstructions for Dynamic Contrast Enhanced MRI 1 Compressed Sensing Reconstructions for Dynamic Contrast Enhanced MRI Kevin T. Looby klooby@stanford.edu ABSTRACT The temporal resolution necessary for dynamic contrast enhanced (DCE) magnetic resonance

More information

Blind Sparse Motion MRI with Linear Subpixel Interpolation

Blind Sparse Motion MRI with Linear Subpixel Interpolation Blind Sparse Motion MRI with Linear Subpixel Interpolation Anita Möller, Marco Maaß, Alfred Mertins Institute for Signal Processing, University of Lübeck moeller@isip.uni-luebeck.de Abstract. Vital and

More information

Accelerated MRI Techniques: Basics of Parallel Imaging and Compressed Sensing

Accelerated MRI Techniques: Basics of Parallel Imaging and Compressed Sensing Accelerated MRI Techniques: Basics of Parallel Imaging and Compressed Sensing Peng Hu, Ph.D. Associate Professor Department of Radiological Sciences PengHu@mednet.ucla.edu 310-267-6838 MRI... MRI has low

More information

JCudaMP: OpenMP/Java on CUDA

JCudaMP: OpenMP/Java on CUDA JCudaMP: OpenMP/Java on CUDA Georg Dotzler, Ronald Veldema, Michael Klemm Programming Systems Group Martensstraße 3 91058 Erlangen Motivation Write once, run anywhere - Java Slogan created by Sun Microsystems

More information

Study and implementation of computational methods for Differential Equations in heterogeneous systems. Asimina Vouronikoy - Eleni Zisiou

Study and implementation of computational methods for Differential Equations in heterogeneous systems. Asimina Vouronikoy - Eleni Zisiou Study and implementation of computational methods for Differential Equations in heterogeneous systems Asimina Vouronikoy - Eleni Zisiou Outline Introduction Review of related work Cyclic Reduction Algorithm

More information

Automatic Generation of Algorithms and Data Structures for Geometric Multigrid. Harald Köstler, Sebastian Kuckuk Siam Parallel Processing 02/21/2014

Automatic Generation of Algorithms and Data Structures for Geometric Multigrid. Harald Köstler, Sebastian Kuckuk Siam Parallel Processing 02/21/2014 Automatic Generation of Algorithms and Data Structures for Geometric Multigrid Harald Köstler, Sebastian Kuckuk Siam Parallel Processing 02/21/2014 Introduction Multigrid Goal: Solve a partial differential

More information

Motivation Hardware Overview Programming model. GPU computing. Part 1: General introduction. Ch. Hoelbling. Wuppertal University

Motivation Hardware Overview Programming model. GPU computing. Part 1: General introduction. Ch. Hoelbling. Wuppertal University Part 1: General introduction Ch. Hoelbling Wuppertal University Lattice Practices 2011 Outline 1 Motivation 2 Hardware Overview History Present Capabilities 3 Programming model Past: OpenGL Present: CUDA

More information

Efficient Imaging Algorithms on Many-Core Platforms

Efficient Imaging Algorithms on Many-Core Platforms Efficient Imaging Algorithms on Many-Core Platforms H. Köstler Dagstuhl, 22.11.2011 Contents Imaging Applications HDR Compression performance of PDE-based models Image Denoising performance of patch-based

More information

CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS

CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS 1 Last time Each block is assigned to and executed on a single streaming multiprocessor (SM). Threads execute in groups of 32 called warps. Threads in

More information

GLIRT: Groupwise and Longitudinal Image Registration Toolbox

GLIRT: Groupwise and Longitudinal Image Registration Toolbox Software Release (1.0.1) Last updated: March. 30, 2011. GLIRT: Groupwise and Longitudinal Image Registration Toolbox Guorong Wu 1, Qian Wang 1,2, Hongjun Jia 1, and Dinggang Shen 1 1 Image Display, Enhancement,

More information

OpenFOAM + GPGPU. İbrahim Özküçük

OpenFOAM + GPGPU. İbrahim Özküçük OpenFOAM + GPGPU İbrahim Özküçük Outline GPGPU vs CPU GPGPU plugins for OpenFOAM Overview of Discretization CUDA for FOAM Link (cufflink) Cusp & Thrust Libraries How Cufflink Works Performance data of

More information

High-Performance Computing Using GPUs

High-Performance Computing Using GPUs High-Performance Computing Using GPUs Luca Caucci caucci@email.arizona.edu Center for Gamma-Ray Imaging November 7, 2012 Outline Slide 1 of 27 Why GPUs? What is CUDA? The CUDA programming model Anatomy

More information

GPU implementation for rapid iterative image reconstruction algorithm

GPU implementation for rapid iterative image reconstruction algorithm GPU implementation for rapid iterative image reconstruction algorithm and its applications in nuclear medicine Jakub Pietrzak Krzysztof Kacperski Department of Medical Physics, Maria Skłodowska-Curie Memorial

More information

FOREWORD TO THE SPECIAL ISSUE ON MOTION DETECTION AND COMPENSATION

FOREWORD TO THE SPECIAL ISSUE ON MOTION DETECTION AND COMPENSATION Philips J. Res. 51 (1998) 197-201 FOREWORD TO THE SPECIAL ISSUE ON MOTION DETECTION AND COMPENSATION This special issue of Philips Journalof Research includes a number of papers presented at a Philips

More information

PLB-HeC: A Profile-based Load-Balancing Algorithm for Heterogeneous CPU-GPU Clusters

PLB-HeC: A Profile-based Load-Balancing Algorithm for Heterogeneous CPU-GPU Clusters PLB-HeC: A Profile-based Load-Balancing Algorithm for Heterogeneous CPU-GPU Clusters IEEE CLUSTER 2015 Chicago, IL, USA Luis Sant Ana 1, Daniel Cordeiro 2, Raphael Camargo 1 1 Federal University of ABC,

More information

Efficient MR Image Reconstruction for Compressed MR Imaging

Efficient MR Image Reconstruction for Compressed MR Imaging Efficient MR Image Reconstruction for Compressed MR Imaging Junzhou Huang, Shaoting Zhang, and Dimitris Metaxas Division of Computer and Information Sciences, Rutgers University, NJ, USA 08854 Abstract.

More information

WEIGHTED L1 AND L2 NORMS FOR IMAGE RECONSTRUCTION: FIRST CLINICAL RESULTS OF EIT LUNG VENTILATION DATA

WEIGHTED L1 AND L2 NORMS FOR IMAGE RECONSTRUCTION: FIRST CLINICAL RESULTS OF EIT LUNG VENTILATION DATA WEIGHTED L1 AND L2 NORMS FOR IMAGE RECONSTRUCTION: FIRST CLINICAL RESULTS OF EIT LUNG VENTILATION DATA PRESENTER: PEYMAN RAHMATI, PHD CANDIDATE, Biomedical Eng., Dept. of Systems and Computer Eng., Carleton

More information

Information about presenter

Information about presenter Information about presenter 2013-now Engineer R&D ithera Medical GmbH 2011-2013 M.Sc. in Biomedical Computing (TU München) Thesis title: A General Reconstruction Framework for Constrained Optimisation

More information

smooth coefficients H. Köstler, U. Rüde

smooth coefficients H. Köstler, U. Rüde A robust multigrid solver for the optical flow problem with non- smooth coefficients H. Köstler, U. Rüde Overview Optical Flow Problem Data term and various regularizers A Robust Multigrid Solver Galerkin

More information

On the Temporal Fidelity of Nonlinear Inverse Reconstructions for Real- Time MRI The Motion Challenge

On the Temporal Fidelity of Nonlinear Inverse Reconstructions for Real- Time MRI The Motion Challenge Send Orders for Reprints to reprints@benthamscience.org The Open Medical Imaging Journal, 2014, 8, 1-7 1 Open Access On the Temporal Fidelity of Nonlinear Inverse Reconstructions for Real- Time MRI The

More information

Gpufit: An open-source toolkit for GPU-accelerated curve fitting

Gpufit: An open-source toolkit for GPU-accelerated curve fitting Gpufit: An open-source toolkit for GPU-accelerated curve fitting Adrian Przybylski, Björn Thiel, Jan Keller-Findeisen, Bernd Stock, and Mark Bates Supplementary Information Table of Contents Calculating

More information

Parallel implicit ordinary differential equation solver for cuda. Tomasz M. Kardaś

Parallel implicit ordinary differential equation solver for cuda. Tomasz M. Kardaś Parallel implicit ordinary differential equation solver for cuda Tomasz M. Kardaś August 11, 2014 Chapter 1 Parallel Implicit Ordinary Differential Equations Solver A simplest definition of stiffness,

More information

radiotherapy Andrew Godley, Ergun Ahunbay, Cheng Peng, and X. Allen Li NCAAPM Spring Meeting 2010 Madison, WI

radiotherapy Andrew Godley, Ergun Ahunbay, Cheng Peng, and X. Allen Li NCAAPM Spring Meeting 2010 Madison, WI GPU-Accelerated autosegmentation for adaptive radiotherapy Andrew Godley, Ergun Ahunbay, Cheng Peng, and X. Allen Li agodley@mcw.edu NCAAPM Spring Meeting 2010 Madison, WI Overview Motivation Adaptive

More information

SYSTEMS OF NONLINEAR EQUATIONS

SYSTEMS OF NONLINEAR EQUATIONS SYSTEMS OF NONLINEAR EQUATIONS Widely used in the mathematical modeling of real world phenomena. We introduce some numerical methods for their solution. For better intuition, we examine systems of two

More information

DIFFERENTIAL. Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka

DIFFERENTIAL. Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka USE OF FOR Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka Faculty of Nuclear Sciences and Physical Engineering Czech Technical University in Prague Mini workshop on advanced numerical methods

More information

THE STEP BY STEP INTERACTIVE GUIDE

THE STEP BY STEP INTERACTIVE GUIDE COMSATS Institute of Information Technology, Islamabad PAKISTAN A MATLAB BASED INTERACTIVE GRAPHICAL USER INTERFACE FOR ADVANCE IMAGE RECONSTRUCTION ALGORITHMS IN MRI Medical Image Processing Research

More information

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,

More information

Improving performances of an embedded RDBMS with a hybrid CPU/GPU processing engine

Improving performances of an embedded RDBMS with a hybrid CPU/GPU processing engine Improving performances of an embedded RDBMS with a hybrid CPU/GPU processing engine Samuel Cremer 1,2, Michel Bagein 1, Saïd Mahmoudi 1, Pierre Manneback 1 1 UMONS, University of Mons Computer Science

More information

Introduction to Matlab/Octave

Introduction to Matlab/Octave Introduction to Matlab/Octave February 28, 2014 This document is designed as a quick introduction for those of you who have never used the Matlab/Octave language, as well as those of you who have used

More information

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI. CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance

More information

Constrained Reconstruction of Sparse Cardiac MR DTI Data

Constrained Reconstruction of Sparse Cardiac MR DTI Data Constrained Reconstruction of Sparse Cardiac MR DTI Data Ganesh Adluru 1,3, Edward Hsu, and Edward V.R. DiBella,3 1 Electrical and Computer Engineering department, 50 S. Central Campus Dr., MEB, University

More information

Krishnan Suresh Associate Professor Mechanical Engineering

Krishnan Suresh Associate Professor Mechanical Engineering Large Scale FEA on the GPU Krishnan Suresh Associate Professor Mechanical Engineering High-Performance Trick Computations (i.e., 3.4*1.22): essentially free Memory access determines speed of code Pick

More information

Combination of Parallel Imaging and Compressed Sensing for high acceleration factor at 7T

Combination of Parallel Imaging and Compressed Sensing for high acceleration factor at 7T Combination of Parallel Imaging and Compressed Sensing for high acceleration factor at 7T DEDALE Workshop Nice Loubna EL GUEDDARI (NeuroSPin) Joint work with: Carole LAZARUS, Alexandre VIGNAUD and Philippe

More information

Master Thesis Accelerating Image Registration on GPUs

Master Thesis Accelerating Image Registration on GPUs Master Thesis Accelerating Image Registration on GPUs A proof of concept migration of FAIR to CUDA Sunil Ramgopal Tatavarty Prof. Dr. Ulrich Rüde Dr.-Ing.Harald Köstler Lehrstuhl für Systemsimulation Universität

More information

Software and Performance Engineering for numerical codes on GPU clusters

Software and Performance Engineering for numerical codes on GPU clusters Software and Performance Engineering for numerical codes on GPU clusters H. Köstler International Workshop of GPU Solutions to Multiscale Problems in Science and Engineering Harbin, China 28.7.2010 2 3

More information

Microsecond Latency, Real-Time, Multi-Input/Output Control using GPU Processing

Microsecond Latency, Real-Time, Multi-Input/Output Control using GPU Processing Microsecond Latency, Real-Time, Multi-Input/Output Control using GPU Processing Nikolaus Rath March 20th, 2013 N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, 2013 1 /

More information

Efficient and Scalable Shading for Many Lights

Efficient and Scalable Shading for Many Lights Efficient and Scalable Shading for Many Lights 1. GPU Overview 2. Shading recap 3. Forward Shading 4. Deferred Shading 5. Tiled Deferred Shading 6. And more! First GPU Shaders Unified Shaders CUDA OpenCL

More information

Parallel Computing with MATLAB

Parallel Computing with MATLAB Parallel Computing with MATLAB Jos Martin Principal Architect, Parallel Computing Tools jos.martin@mathworks.co.uk 1 2013 The MathWorks, Inc. www.matlabexpo.com Code used in this presentation can be found

More information

Parallel Computing with MATLAB

Parallel Computing with MATLAB Parallel Computing with MATLAB Jos Martin Principal Architect, Parallel Computing Tools jos.martin@mathworks.co.uk 2015 The MathWorks, Inc. 1 Overview Scene setting Task Parallel (par*) Why doesn t it

More information

Inter-slice Reconstruction of MRI Image Using One Dimensional Signal Interpolation

Inter-slice Reconstruction of MRI Image Using One Dimensional Signal Interpolation IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.10, October 2008 351 Inter-slice Reconstruction of MRI Image Using One Dimensional Signal Interpolation C.G.Ravichandran

More information

2009: The GPU Computing Tipping Point. Jen-Hsun Huang, CEO

2009: The GPU Computing Tipping Point. Jen-Hsun Huang, CEO 2009: The GPU Computing Tipping Point Jen-Hsun Huang, CEO Someday, our graphics chips will have 1 TeraFLOPS of computing power, will be used for playing games to discovering cures for cancer to streaming

More information

High-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs

High-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs High-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs Gordon Erlebacher Department of Scientific Computing Sept. 28, 2012 with Dimitri Komatitsch (Pau,France) David Michea

More information

The Benefit of Tree Sparsity in Accelerated MRI

The Benefit of Tree Sparsity in Accelerated MRI The Benefit of Tree Sparsity in Accelerated MRI Chen Chen and Junzhou Huang Department of Computer Science and Engineering, The University of Texas at Arlington, TX, USA 76019 Abstract. The wavelet coefficients

More information

X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management

X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management Hideyuki Shamoto, Tatsuhiro Chiba, Mikio Takeuchi Tokyo Institute of Technology IBM Research Tokyo Programming for large

More information

REPORT DOCUMENTATION PAGE

REPORT DOCUMENTATION PAGE REPORT DOCUMENTATION PAGE Form Approved OMB NO. 0704-0188 The public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions,

More information

Incremental Migration of C and Fortran Applications to GPGPU using HMPP HPC Advisory Council China Conference 2010

Incremental Migration of C and Fortran Applications to GPGPU using HMPP HPC Advisory Council China Conference 2010 Innovative software for manycore paradigms Incremental Migration of C and Fortran Applications to GPGPU using HMPP HPC Advisory Council China Conference 2010 Introduction Many applications can benefit

More information

Cardiac Dual Energy CT: Technique

Cardiac Dual Energy CT: Technique RSNA 2013, VSCA51-01, Chicago, Dec. 5, 2013 Cardiac Radiology Series Cardiac Dual Energy CT: Technique Willi A. Kalender, Ph.D. Institute of Medical Physics University of Erlangen www.imp.uni-erlangen.de

More information

OpenStaPLE, an OpenACC Lattice QCD Application

OpenStaPLE, an OpenACC Lattice QCD Application OpenStaPLE, an OpenACC Lattice QCD Application Enrico Calore Postdoctoral Researcher Università degli Studi di Ferrara INFN Ferrara Italy GTC Europe, October 10 th, 2018 E. Calore (Univ. and INFN Ferrara)

More information

INTRODUCTION TO OPENACC. Analyzing and Parallelizing with OpenACC, Feb 22, 2017

INTRODUCTION TO OPENACC. Analyzing and Parallelizing with OpenACC, Feb 22, 2017 INTRODUCTION TO OPENACC Analyzing and Parallelizing with OpenACC, Feb 22, 2017 Objective: Enable you to to accelerate your applications with OpenACC. 2 Today s Objectives Understand what OpenACC is and

More information

Overview. Videos are everywhere. But can take up large amounts of resources. Exploit redundancy to reduce file size

Overview. Videos are everywhere. But can take up large amounts of resources. Exploit redundancy to reduce file size Overview Videos are everywhere But can take up large amounts of resources Disk space Memory Network bandwidth Exploit redundancy to reduce file size Spatial Temporal General lossless compression Huffman

More information

A TEMPORAL FREQUENCY DESCRIPTION OF THE SPATIAL CORRELATION BETWEEN VOXELS IN FMRI DUE TO SPATIAL PROCESSING. Mary C. Kociuba

A TEMPORAL FREQUENCY DESCRIPTION OF THE SPATIAL CORRELATION BETWEEN VOXELS IN FMRI DUE TO SPATIAL PROCESSING. Mary C. Kociuba A TEMPORAL FREQUENCY DESCRIPTION OF THE SPATIAL CORRELATION BETWEEN VOXELS IN FMRI DUE TO SPATIAL PROCESSING by Mary C. Kociuba A Thesis Submitted to the Faculty of the Graduate School, Marquette University,

More information

On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators

On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators Karl Rupp, Barry Smith rupp@mcs.anl.gov Mathematics and Computer Science Division Argonne National Laboratory FEMTEC

More information

Radically Simplified GPU Parallelization: The Alea Dataflow Programming Model

Radically Simplified GPU Parallelization: The Alea Dataflow Programming Model Radically Simplified GPU Parallelization: The Alea Dataflow Programming Model Luc Bläser Institute for Software, HSR Rapperswil Daniel Egloff QuantAlea, Zurich Funded by Swiss Commission of Technology

More information

Warped parallel nearest neighbor searches using kd-trees

Warped parallel nearest neighbor searches using kd-trees Warped parallel nearest neighbor searches using kd-trees Roman Sokolov, Andrei Tchouprakov D4D Technologies Kd-trees Binary space partitioning tree Used for nearest-neighbor search, range search Application:

More information

FPGA-based Supercomputing: New Opportunities and Challenges

FPGA-based Supercomputing: New Opportunities and Challenges FPGA-based Supercomputing: New Opportunities and Challenges Naoya Maruyama (RIKEN AICS)* 5 th ADAC Workshop Feb 15, 2018 * Current Main affiliation is Lawrence Livermore National Laboratory SIAM PP18:

More information