GPU-Based Acceleration for CT Image Reconstruction

Similar documents
An Enhanced Image Reconstruction Tool for Computed Tomography on GPUs

GPU implementation for rapid iterative image reconstruction algorithm

high performance medical reconstruction using stream programming paradigms

EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI

Tomography at all Scales. Uccle, 7 April 2014

Flexible Batched Sparse Matrix-Vector Product on GPUs

Exploiting GPU Caches in Sparse Matrix Vector Multiplication. Yusuke Nagasaka Tokyo Institute of Technology

Lecture 6: Input Compaction and Further Studies

Medical Image Reconstruction Term II 2012 Topic 6: Tomography

Auto-Tuning Strategies for Parallelizing Sparse Matrix-Vector (SpMV) Multiplication on Multi- and Many-Core Processors

CUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging

Institute of Cardiovascular Science, UCL Centre for Cardiovascular Imaging, London, United Kingdom, 2

Tomographic Reconstruction

Fast Segmented Sort on GPUs

Massively Parallel GPU-friendly Algorithms for PET. Szirmay-Kalos László, Budapest, University of Technology and Economics

Highly Efficient Compensationbased Parallelism for Wavefront Loops on GPUs

Acknowledgments and financial disclosure

Continuous and Discrete Image Reconstruction

Generating and Automatically Tuning OpenCL Code for Sparse Linear Algebra

CIVA Computed Tomography Modeling

GPU-Accelerated Parallel Sparse LU Factorization Method for Fast Circuit Analysis

Algebraic Iterative Methods for Computed Tomography

Optimization of Cone Beam CT Reconstruction Algorithm Based on CUDA

Dynamic Sparse Matrix Allocation on GPUs. James King

A Scalable Parallel LSQR Algorithm for Solving Large-Scale Linear System for Seismic Tomography

Computed Tomography (CT) Scan Image Reconstruction on the SRC-7 David Pointer SRC Computers, Inc.

HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes.

GPUs in CT Reconstruction. Logan Johnson

Thrust ++ : Portable, Abstract Library for Medical Imaging Applications

Reconstruction in CT and relation to other imaging modalities

NIH Public Access Author Manuscript Med Phys. Author manuscript; available in PMC 2009 March 13.

GPU acceleration of 3D forward and backward projection using separable footprints for X-ray CT image reconstruction

Reconstruction in CT and relation to other imaging modalities

Application of optimal sampling lattices on CT image reconstruction and segmentation or three dimensional printing

Sparse Matrix Formats

Méthodes d imagerie pour les écoulements et le CND

Efficient 3D Crease Point Extraction from 2D Crease Pixels of Sinogram

Enhancement Image Quality of CT Using Single Slice Spiral Technique

Refraction Corrected Transmission Ultrasound Computed Tomography for Application in Breast Imaging

custinger - Supporting Dynamic Graph Algorithms for GPUs Oded Green & David Bader

Limited view X-ray CT for dimensional analysis

Geometric Partitioning for Tomography

Reconstruction of CT Images from Sparse-View Polyenergetic Data Using Total Variation Minimization

COMPARATIVE STUDIES OF DIFFERENT SYSTEM MODELS FOR ITERATIVE CT IMAGE RECONSTRUCTION

Warps and Reduction Algorithms

Cover Page. The handle holds various files of this Leiden University dissertation.

OSKAR: Simulating data from the SKA

Solving the heat equation with CUDA

Column-Action Methods in Image Reconstruction

High-performance tomographic reconstruction using graphics processing units

Central Slice Theorem

Efficient Lists Intersection by CPU- GPU Cooperative Computing

CS 179: GPU Programming. Lecture 12 / Homework 4

Application of GPU-Based Computing to Large Scale Finite Element Analysis of Three-Dimensional Structures

GE s Revolution CT MATLAB III: CT. Kathleen Chen March 20, 2018

GPU-Based DWT Acceleration for JPEG2000

University of California at Los Angeles. Electrical Engineering. Iterative Reconstruction for 3D Tomography

Research in Middleware Systems For In-Situ Data Analytics and Instrument Data Analysis

CT NOISE POWER SPECTRUM FOR FILTERED BACKPROJECTION AND ITERATIVE RECONSTRUCTION

Introduction to GPGPUs and to CUDA programming model: CUDA Libraries

Page rank computation HPC course project a.y Compute efficient and scalable Pagerank

Multipredicate Join Algorithms for Accelerating Relational Graph Processing on GPUs

Parallel graph traversal for FPGA

Sparse Linear Algebra in CUDA

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield

Radon Transform and Filtered Backprojection

X-ray imaging software tools for HPC clusters and the Cloud

University of California at Los Angeles. Electrical Engineering. C and CUDA Implementation for SIRT and SART Reconstruction Algorithms

A Study of Sampling Strategies for Helical CT

Algebraic Iterative Methods for Computed Tomography

Computer-Tomography II: Image reconstruction and applications

Advanced Image Reconstruction Methods for Photoacoustic Tomography

Spiral ASSR Std p = 1.0. Spiral EPBP Std. 256 slices (0/300) Kachelrieß et al., Med. Phys. 31(6): , 2004

Porting the NAS-NPB Conjugate Gradient Benchmark to CUDA. NVIDIA Corporation

Adaptive algebraic reconstruction technique

Introduction to Biomedical Imaging

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

Interior Reconstruction Using the Truncated Hilbert Transform via Singular Value Decomposition

Reduction of Metal Artifacts in Computed Tomographies for the Planning and Simulation of Radiation Therapy

Correlation between Model and Human Observer Performance on a Lesion Shape Discrimination Task in CT

Biophysical Techniques (BPHS 4090/PHYS 5800)

High Resolution Iterative CT Reconstruction using Graphics Hardware

Fast Reconstruction of CFRP X-ray Images based on a Neural Network Filtered Backprojection Approach

Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers

Multi-slice CT Image Reconstruction Jiang Hsieh, Ph.D.

CSE 599 I Accelerated Computing - Programming GPUS. Parallel Pattern: Sparse Matrices

Back-Projection on GPU: Improving the Performance

Optimization of GPU-Accelerated Iterative CT Reconstruction Algorithm for Clinical Use

Generalized Filtered Backprojection for Digital Breast Tomosynthesis Reconstruction

Investigation on reconstruction methods applied to 3D terahertz computed Tomography

Material for Chapter 6: Basic Principles of Tomography M I A Integral Equations in Visual Computing Material

TACKLING THE CHALLENGES OF NEXT GENERATION HEALTHCARE

c 2014 Society for Industrial and Applied Mathematics

Sparse sampling in MRI: From basic theory to clinical application. R. Marc Lebel, PhD Department of Electrical Engineering Department of Radiology

Towards a Performance- Portable FFT Library for Heterogeneous Computing

BME I5000: Biomedical Imaging

Implementation of BFS on shared memory (CPU / GPU)

GPU-accelerated ray-tracing for real-time treatment planning

EE382N (20): Computer Architecture - Parallelism and Locality Spring 2015 Lecture 09 GPUs (II) Mattan Erez. The University of Texas at Austin

A New Algorithm for Joint Reconstruction and Compression of Industrial Computed Tomography Images

Transcription:

GPU-Based Acceleration for CT Image Reconstruction Xiaodong Yu Advisor: Wu-chun Feng Collaborators: Guohua Cao, Hao Gong

Outline Introduction and Motivation Background Knowledge Challenges and Proposed Solutions Experimental Results

Introduction and Motivation Computed Tomography (CT) Indispensable imaging modality relying on multiple x-ray projections of the subject Reconstruct 2D or 3D images through projection vector collected from detector 60 million patients undergo a CT scan in 2002 in US Disadvantage: carcinogenic nature of X-ray (should follow As Low As Reasonably Achievable principle)

Introduction and Motivation CT image reconstruction approaches Analytical methods Iterative methods Filtered Backprojection (FBP) technique Typical analytical algorithm, based on the Fourier Slice Theorem Advantage: fast reconstruction Disadvantage: requires many projection angles, hence more radiation dose for patients Algebraic Reconstruction Technique (ART) Basic iterative algorithm Advantage: produces better images with fewer projection Disadvantage: high computational cost One possible solution: map to HPC platform e.g. GPU

Background Knowledge Physical meaning of System Matrix X-ray source and detector rotate around the object to get different views The value of each system matrix entry is the weighting factor for corresponding pixel and corresponding ray Column index of system matrix represent pixel index Row index of system matrix represent ray index Is a sparse matrix

Background Knowledge Physical meaning of System Matrix View 1 View 2 View n detector source source 1024 rays 1024 rays source detector detector 1024 rays

Background Knowledge Algebraic Reconstruction Technique (ART) f j (n+1) = fj (n) + λn p i S i N w2 j=1 ii w ii Where S i = N (n) j=1 f j wii and n correlated to i f is the unknown image vector (N*1) of N=n*n pixels, p is the projection vector (M*1) of M rays, and w is the system matrix (M*N), whose element w ii is the weight coefficient, representing the contribution of the jth pixel to the ith ray integral Ray 1 Ray 2 init image(1) image(0) image(2) Ray N image(n) i.e. Recon image

Background Knowledge The ABC of GPUs Two-level hardware parallelism 32 SIMPLE PROCESSING CORES SIMT processing: need for minimizing control flow divergence several independent STREAMING MULTIPROCESSORS (SMs)

Challenges and Proposed Solutions Memory space requirement A Tesla GPU usually has 6GB global memory Uncompressed system matrix requires hundreds of GB memory space Even a CSR-based compressed system matrix may require tens of GB memory space Need to exploit further compression View number 180 360 Image size Nonzero # CSR size (GB) Nonzero # CSR size (GB) 512 2 157M 1.17 314M 2.34 1024 2 415M 3.09 830M 6.19 2048 2 1237M 9.22 2475M 18.44

Challenges and Proposed Solutions Symmetry-Based Compression Two kind of view symmetry: rotational and reflection Only weighing factors whose corresponding views are within one 45 degree need to be recorded, accordingly, only one eighth of system matrix need to be transfer to GPU, all others can be calculated in the run time

Challenges and Proposed Solutions Load Imbalance Each row of system matrix is assigned to one thread Different rows have different numbers of non-zero element, lead to different threads have different workloads Load imbalance degrade the performance

Challenges and Proposed Solutions Intuitive idea sort solution sorting

Experimental Results Symmetry-Based Compression View number Image size CSR size (GB) 360 720 w/ Sym-based compression(gb) CSR size (GB) 256 2 1.01 0.12 2.03 0.25 512 2 2.34 0.29 4.82 0.60 1024 2 6.19 0.77 11.97 1.50 2048 2 18.44 2.31 37.41 4.68 w/ Sym-based compression(gb)

Experimental Results Performance Comparisons Two image sizes: 1024*1024, 512*512 Both have 720 views Both do 10 iterations 1400 1328.21 1200 Execution time (second) 1000 800 600 400 390.43 CPU cusparse CUDA code1 CUDA code2 200 132.7 0 31.91 29.61 image size=1024^2 51.92 9.87 image size=512^2 8.29

Back up

COO Coordinate list (COO) Basic format for sparse matrix compression stores a list of (row, column, value) tuples row, column, value are three arrays row stores the row indices of non-zero entries Column stores the column indices of non-zero entries Value stores the values of non-zero entries

CSR Compressed Sparse Row (CSR) Another basic format for sparse matrix compression Avoid the off-chip memory pressure of COO (accessing row indices) Consists of three arrays: val, col_ind, row_ptr val stores the values of non-zero entries col_ind stores the column indices of non-zero entries row_ptr stores the start pointers of the nonzeros of the rows