RapidMind. Accelerating Medical Imaging. May 13, 2009

Similar documents
Georgia Institute of Technology, August 17, Justin W. L. Wan. Canada Research Chair in Scientific Computing

The RapidMind Platform for Portable Programming of Multi-Core Processors and Many-Core Accelerators

high performance medical reconstruction using stream programming paradigms

CUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

INDUSTRIAL SYSTEM DEVELOPMENT FOR VOLUMETRIC INTEGRITY

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

Medical Image Processing: Image Reconstruction and 3D Renderings

REAL-TIME ADAPTIVITY IN HEAD-AND-NECK AND LUNG CANCER RADIOTHERAPY IN A GPU ENVIRONMENT

DiFi: Distance Fields - Fast Computation Using Graphics Hardware

G P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G

Accelerating image registration on GPUs

3D Registration based on Normalized Mutual Information

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

William Yang Group 14 Mentor: Dr. Rogerio Richa Visual Tracking of Surgical Tools in Retinal Surgery using Particle Filtering

A Fast GPU-Based Approach to Branchless Distance-Driven Projection and Back-Projection in Cone Beam CT

Refraction Corrected Transmission Ultrasound Computed Tomography for Application in Breast Imaging

Introduction to Medical Image Processing

Addressing Heterogeneity in Manycore Applications

Duksu Kim. Professional Experience Senior researcher, KISTI High performance visualization

GPU acceleration of 3D forward and backward projection using separable footprints for X-ray CT image reconstruction

Turbostream: A CFD solver for manycore

ADVANCING CANCER TREATMENT

Accelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors

3D Slicer Overview. Andras Lasso, PhD PerkLab, Queen s University

Computer-Aided Diagnosis in Abdominal and Cardiac Radiology Using Neural Networks

Computed Tomography. Principles, Design, Artifacts, and Recent Advances. Jiang Hsieh THIRD EDITION. SPIE PRESS Bellingham, Washington USA

Accelerating Double Precision FEM Simulations with GPUs

Limited view X-ray CT for dimensional analysis

Towards Breast Anatomy Simulation Using GPUs

ADVANCING CANCER TREATMENT

Comparison of High-Speed Ray Casting on GPU

Towards deformable registration for augmented reality in robotic assisted partial nephrectomy

The Insight Toolkit. Image Registration Algorithms & Frameworks

Modeling and preoperative planning for kidney surgery

Use cases. Faces tagging in photo and video, enabling: sharing media editing automatic media mashuping entertaining Augmented reality Games

GPU implementation for rapid iterative image reconstruction algorithm

Lecture 1: Introduction and Computational Thinking

ECE 8823: GPU Architectures. Objectives

MEDICAL IMAGE ANALYSIS

Non-Destructive Failure Analysis and Measurement for Molded Devices and Complex Assemblies with X-ray CT and 3D Image Processing Techniques

REDUCING BEAMFORMING CALCULATION TIME WITH GPU ACCELERATED ALGORITHMS

Multi-View Stereo for Static and Dynamic Scenes

RapidMind & PGI Accelerator Compiler. Dr. Volker Weinberg Leibniz-Rechenzentrum der Bayerischen Akademie der Wissenschaften

X-TRACT: software for simulation and reconstruction of X-ray phase-contrast CT

3/27/2012 WHY SPECT / CT? SPECT / CT Basic Principles. Advantages of SPECT. Advantages of CT. Dr John C. Dickson, Principal Physicist UCLH

ClearSpeed Visual Profiler

Accelerated C-arm Reconstruction by Out-of-Projection Prediction

CT NOISE POWER SPECTRUM FOR FILTERED BACKPROJECTION AND ITERATIVE RECONSTRUCTION

Respiratory Motion Compensation for C-arm CT Liver Imaging

TOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT

Combination of Markerless Surrogates for Motion Estimation in Radiation Therapy

Thrust ++ : Portable, Abstract Library for Medical Imaging Applications

Optimization of Cone Beam CT Reconstruction Algorithm Based on CUDA

Registration concepts for the just-in-time artefact correction by means of virtual computed tomography

Accelerating CFD with Graphics Hardware

GPGPU. Peter Laurens 1st-year PhD Student, NSC

GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS

Recent Advances in Heterogeneous Computing using Charm++

Ch. 4 Physical Principles of CT

Index. aliasing artifacts and noise in CT images, 200 measurement of projection data, nondiffracting

The GPU as a co-processor in FEM-based simulations. Preliminary results. Dipl.-Inform. Dominik Göddeke.

MEDICAL IMAGE NOISE REDUCTION AND REGION CONTRAST ENHANCEMENT USING PARTIAL DIFFERENTIAL EQUATIONS

AIDR 3D Iterative Reconstruction:

Biomedical Imaging Registration Trends and Applications. Francisco P. M. Oliveira, João Manuel R. S. Tavares

Image Registration Lecture 1: Introduction

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

GeoProbe Geophysical Interpretation Software

BLUT : Fast and Low Memory B-spline Image Interpolation

ThE ultimate, INTuITIVE Mr INTErFAcE

Medical Images Analysis and Processing

Computing on GPUs. Prof. Dr. Uli Göhner. DYNAmore GmbH. Stuttgart, Germany

The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System

The Evolution of Big Data Platforms and Data Science

Object Identification in Ultrasound Scans

Stream Processing with CUDA TM A Case Study Using Gamebryo's Floodgate Technology

FIELD PARADIGM FOR 3D MEDICAL IMAGING: Safer, More Accurate, and Faster SPECT/PET, MRI, and MEG

Advanced Visual Medicine: Techniques for Visual Exploration & Analysis

Parallel Programming Principle and Practice. Lecture 9 Introduction to GPGPUs and CUDA Programming Model

Directed Optimization On Stencil-based Computational Fluid Dynamics Application(s)

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST

CME 213 S PRING Eric Darve

CMPE 665:Multiple Processor Systems CUDA-AWARE MPI VIGNESH GOVINDARAJULU KOTHANDAPANI RANJITH MURUGESAN

Automatic Intra-Application Load Balancing for Heterogeneous Systems

Deviceless respiratory motion correction in PET imaging exploring the potential of novel data driven strategies

Direct Rendering of Trimmed NURBS Surfaces

GPU Accelerating Speeded-Up Robust Features Timothy B. Terriberry, Lindley M. French, and John Helmsen

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics

Acknowledgments. Nesterov s Method for Accelerated Penalized-Likelihood Statistical Reconstruction for C-arm Cone-Beam CT.

Deformable and Fracturing Objects

Dell EMC ScaleIO Ready Node

Spatio-Temporal Registration of Biomedical Images by Computational Methods

Dragonfly Pro. Visual Pathway to Quantitative Answers ORS. Exclusive to ZEISS OBJECT RESEARCH SYSTEMS

Integrating Modalities to Enterprise PACS through Thinking Systems

GPU applications in Cancer Radiation Therapy at UCSD. Steve Jiang, UCSD Radiation Oncology Amit Majumdar, SDSC Dongju (DJ) Choi, SDSC

Large scale Imaging on Current Many- Core Platforms

Spiral ASSR Std p = 1.0. Spiral EPBP Std. 256 slices (0/300) Kachelrieß et al., Med. Phys. 31(6): , 2004

GPU ACCELERATED TOTAL FOCUSING METHOD IN CIVA

SIGMI. ISL & CGV Joint Research Proposal ~Image Fusion~

Volume Graphics Introduction

Transcription:

RapidMind Accelerating Medical Imaging May 13, 2009

Outline Medical imaging software challenges Case studies Elastography, registration, breast cancer screening Platform System overview Detailed example: algebraic reconstruction Medical imaging solutions Imaging extensions Registration component Component roadmap Slide 2

Medical Imaging Software Development Challenges Need for application acceleration Growing data sizes Mainstreaming of volume imaging Need for efficient workflows and interactivity Need for advanced volume processing increasing Insufficient development resources GPUs and multi-core CPUs are capable but difficult to program Hardware choices in constant flux High costs associated with changing development tools Need to complete projects faster Need to reduce effort spent on non-core development Want to focus effort on value-added features Slide 3

Challenges Need for Application Acceleration Growing data sizes Toshiba 320-slice dynamic volume CT imager 4D perfusion and cardiac studies Mainstreaming of volume imaging Frost and Sullivan: 3D studies now standard, 4D usage increasing Need for efficient workflows and interactivity Radiologists paid by case Need for advanced volume processing Automated cancer screening Automated Alzheimer s screening Multimodality studies Stroke evaluation Real-time registration for image-guided surgery Real-time registration for radiation therapy Slide 4

Medical Imaging and Visualization Case Studies University of Waterloo 2D/3D Registration of CT and X-Ray data for image-guided surgery With RapidMind 100x speed up driving time to 3 seconds Image analysis is rendered real-time Cancer screening and diagnosis is immediate, and can be used to guide non-invasive surgeries Dartmouth Medical School Quasi-static Elastography image analysis used to diagnose breast cancer from ultra-sound data With RapidMind 10x speed up driving time to 1 second Image analysis is rendered real-time Cancer screening and diagnosis is immediate, and can be used to guide non-invasive surgeries RapidMind Demonstration Algebraic reconstruction With RapidMind 60x speedup Reconstruction completed faster than data can be acquired Possible to use more flexible imaging geometries and obtain higher image quality than is possible with traditional cone-beam backprojection 5 Pioneer in the development of medical software solutions that help improve workflow and productivity B-CAD V1 Computer-aided Detection of breast cancer from ultrasound images With RapidMind 9x speed up on multi-core CPU with GPU acceleration A heterogeneous application that leverages multi-core with seamless GPU acceleration Increased accuracy of diagnosis Slide 5

Medical Imaging Case Studies 2D/3D Registration 100x RapidMind implementation runs over 100 times faster than serial C++ baseline 2D/3D registration finds rotation of 3D volume so its projection aligns with given 2D image Combines volume rendering for projection with computation of registration metric and optimization Useful for contrast enhancement in image-guided surgery For more information see: Lin Xu and Justin Wan, Real- Time Intensity-Based Rigid 2D-3D Medical Image Registration Using RapidMind Multi-Core Platform. Slide 6

Medical Imaging Case Studies Computer-aided Detection of Breast Cancer 9x RapidMind implementation runs 9 times faster than original design a performance requirement for customer acceptance Medipattern s award-winning flagship B-CAD solution helps characterize, evaluate, and document breast ultrasound images in a thorough, organized way that integrates easily into daily practice. Raw Ultrasound Image Lesion detected and marked Slide 7

Medical Imaging Case Studies Elastography Image Analysis 10x RapidMind implementation runs 10 times faster achieving interactive user experience Raw Ultrasound Image Correlated Ultrasound Image A closer look: NVIDIA 8800 GPU Original MATLAB implementation: 240 seconds to render C++ implementation: 10 seconds to render RapidMind implementation: 1 second to render Source: Dr. Susan Swartz, Dartmouth Medical School, Dr. Marvin Doyley, Professor of Radiology and John Wallace, Academic Computing. Slide 8

Medical Imaging Case Studies Algebraic Reconstruction 60x RapidMind s algebraic reconstruction implementation can reconstruct an entire volume in under 3 seconds using a single GPU X-rays projecting through the volume Visualization of algebraic reconstruction of CT data A closer look: GPU AMD quad-core Opteron with ATI 1900 GPU 6 iterations used to reconstruct a 256x256x112 volume in 3s total Slide 9

RapidMind Product Overview RapidMind provides: 1. A flexible platform that allows an arbitrary algorithm to be expressed and efficiently mapped to both multi-core CPUs and GPUs 2. Accelerated volume processing components that provide core building blocks for medical imaging applications Slide 10

RapidMind Platform Benefits The RapidMind Multi-Core Development Platform provides software organizations with: C++ Application Automated Data Parallel Data distribution across multiple cores in a multi-core processor Productivity Build and deliver multi-core capable applications faster using existing practices, tools and compilers Performance Significant performance gain on one core, multiple cores, or accelerators Portability Applications are hardware independent and will automatically scale to additional cores and future multi-core processors and accelerators Copyright 2008, RapidMind, Inc. Slide 11

Portability Now and Into the Future Trends: Massive parallelism, heterogeneity, and hybrid computing RapidMind provides portability, scalability and future-proofing Slide 12

RapidMind Platform System Architecture API Intuitive, integrates with C++, and requires no new tools or workflow Platform Code Optimizer analyzes and optimizes computations to remove overhead Load Balancer plans and synchronizes work to keep all cores fully utilized Data Manager reduces data bottlenecks Logging/Diagnostics detects and reports performance bottlenecks Processor Support Modules x86 processors from AMD and Intel ATI/AMD and NVIDIA GPUs Cell Blade, Cell Accelerator Board, PS3 Copyright 2008, RapidMind, Inc. Slide 13

RapidMind Platform Programming Model Map function over array: Parallel application: C = f(a,b) Can use control flow (SPMD model) Can random access other arrays Can declare and use local arrays Can call sub-programs Can read and write to subarrays Collective operations: Reduce: a = reduce(p,a) Others Copyright 2008, RapidMind, Inc. Slide 14

RapidMind Platform Interface Summary Usage: Include platform header Link to runtime library Data: Values Arrays Data abstraction Programs: Defined dynamically Execute on coprocessors Code abstraction #include <rapidmind/platform.hpp> using namespace rapidmind; Value1f f = 2.0f; Array<2,Value3f> a(512,512); Array<2,Value3f> b(512,512); Program prog = BEGIN { In<Value3f> r, s; Out<Value3f> q; q = (r + s) * f; } END; a = prog(a,b); f = 3.0f; stride(a,2,2) = prog( slice(a,0,255,0,255), slice(b,256,511,0,255)); Copyright 2008, RapidMind, Inc. Slide 15

Medical Imaging Case Studies Algebraic Reconstruction 60x RapidMind s algebraic reconstruction implementation can reconstruct an entire volume in under 3 seconds using a single GPU X-rays projecting through the volume Visualization of algebraic reconstruction of CT data A closer look: GPU AMD quad-core Opteron with ATI 1900 GPU 6 iterations used to reconstruct a 256x256x112 volume in 3s total Slide 16

Algebraic Reconstruction Kaczmarz Iterative Method Let y be the measured projections Let z be the (unknown) volume Let A be the volume-to-projection mapping Need to solve y = Az for z given y Kaczmarz iteration: z (i+1) z (i) + A T R (y A z (i) ) Interpretation: A is forward projection y A z is correction A T is backprojection R is filtering (preconditioner; sharpening) Copyright 2008, RapidMind, Inc. Slide 17

Algebraic Reconstruction Kaczmarz Iterative Method Array<2,Value4f> reconstruct ( Array<2,Value4f> y ) { sharp = 100.0f; beta = 3.0f; Array<2,Value4f> z; z = backproject(y); for (int i = 1; i < n_iterations; i++) { Array<2,Value4f> c = project(y,z); z = backproject(z,c); sharp *= 0.05f; // converge to zero float blend = 0.7f; beta = blend*beta + (1.0f-blend)*0.9f; } return z; } Copyright 2008, RapidMind, Inc. Slide 18

Algebraic Reconstruction Kaczmarz Iterative Method prog_backproj = BEGIN { In<Value2i> u; In<Value4f> base; Out<Value4f> r = 0; Value1f theta = 0; Value1f theta_inc = float(pi/data_height); Value2f p = Value2f(u)*Value2f(1.0/data_height,1.0/data_width) + Value2f(-0.5,-0.5); Value2f c = Value2f(-sharp,1.0f+2.0f*sharp); Value1i i = N; DO { Value2f d = Value2f(sin(theta),-cos(theta)); Value1f t = dot(p,d)+0.5f; Value2f v = Value2f(t,theta/PI); Value2f w = floor(v); Value2f f; f(0) = v(0) - w(0); f(1) = 1.0f - f(0); Value4f ww = w(0,0,0,1) + Value4f(-1,1,2,0); Value4f s1 = data[w]; Value4f s2 = data[ww(1,3)]; Value4f s0 = data[ww(0,3)]; Value4f s3 = data[ww(2,3)]; Value4f h0 = c(0)*s0 + c(1)*s1 + c(0)*s2; Value4f h1 = c(0)*s1 + c(1)*s2 + c(0)*s3; r += f(1)*h0 + f(0)*h1; theta += theta_inc; } UNTIL (0 == --i); r = base + beta * r / float(n); } END; Copyright 2008, RapidMind, Inc. Slide 19

Extensions RapidMind Imaging Extensions Basic libraries and platform support for imaging Common patterns of parallel computation in imaging are built into the platform Point operations and stencil-based filtering operations are native to the platform Category reductions for computations on irregular blobs are built-in operations (also an implementation of map-reduce ) Libraries for common operations: K-means quantization Flood-fill labeling of connected components Noise reduction Slide 20

RapidMind Imaging Extensions Slide 21

RapidMind Imaging Extensions Slide 22

RapidMind Imaging Components Components Libraries that solve some of the most challenging enabling functionality in the volume processing pipeline Filtering Noise Reduction Accelerate performance by leveraging multiple CPU cores and GPU hardware acceleration Minimum of 10x the performance of existing CPU-based implementations Flexibility to customize through configuration, source license, or professional services Inhomogeneity Correction Rigid Registration Deformable Registration Order of magnitude performance improvement can enable you to deliver new workflows and new applications Segmentation Volume Rendering Slide 23

Key features and functionality Imaging Components MSD Rigid Registration Rigid registration with Euclidean transformations Mean squared differences comparison metric Linear interpolation of moving volume Bounding-box ROI on fixed volume Two volumes of up to 512x512x256 voxels Support for 16-bit precision data Use cases Comparison of rigid volumes taken at different times with the same modality Slide 24

RapidMind Rigid Registration Performance Test case using entire 512x512x64 volume and a GeForce 280 GPU Inner loop performance: Approximately 25 SSD comparison metric evaluations Metric comparisons require 95% of runtime Speedups of 75x to 95x observed over serial ITK baseline Overall performance: Serial ITK baseline: 254s RapidMind implementation: 16s (approx 16x faster overall) Observations: After acceleration, overall performance dominated by non-accelerated serial preprocessing. A portion of that could also be accelerated. ITK is still useful for preprocessing, as long as these operations do not dominate performance. Slide 25

RapidMind Rigid Registration Performance vs. ROI Size 1000 900 800 700 600 500 400 300 200 100 0 2673300 8019900 13366500 18713100 24059700 Metric ITK ms 25000 20000 15000 10000 5000 0 2673300 8019900 13366500 18713100 24059700 RR ITK ms Metric Speedup Rigid Registration Speedup 50 40 30 20 10 0 2673300 8019900 13366500 18713100 24059700 Metric 16 14 12 10 8 6 4 2 0 2673300 8019900 13366500 18713100 24059700 Rigid Registration Slide 26

Conclusion Matching Goals and Capabilities Goal Reduce development costs Reduce time to market Reduce risk Capabilities Needed Application developers can be responsible for optimization without having to become GPU programming experts Architects can leverage a set of fundamental components that let them concentrate on specific value-add functionality Leverage existing accelerated libraries for volume processing Access an external group of experts in developing accelerated imaging algorithms Simultaneously support both NVIDIA and ATI GPUs Solution will continue to work on future processors such as Intel Larrabee Volume processing capabilities built on a customizable imaging platform that is easily extended Slide 27

RapidMind Conclusion Since 2004, entire company dedicated to delivering accelerated solutions on GPUs and multi-core CPUs Spun off from 5 years of University of Waterloo research Company is a recognized leader in accelerated computing RapidMind has helped software organizations accelerate their imaging applications by an order of magnitude RapidMind provides A general-purpose software development platform that allows users to implement and deploy their own high-performance portable algorithms on accelerators Customizable and portable core software components for accelerated medical imaging on both GPUs and multi-core CPUs Slide 28