GPU-based high-performance computing for radiotherapy applications

Similar documents
gpmc: GPU-Based Monte Carlo Dose Calculation for Proton Radiotherapy Xun Jia 8/7/2013

Outline. Outline 7/24/2014. Fast, near real-time, Monte Carlo dose calculations using GPU. Xun Jia Ph.D. GPU Monte Carlo. Clinical Applications

CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield

General Purpose GPU Computing in Partial Wave Analysis

Customizable and Advanced Software for Tomographic Reconstruction

Op#miza#on of CUDA- based Monte Carlo Simula#on for Radia#on Therapy. GTC 2014 N. Henderson & K. Murakami

Tesla GPU Computing A Revolution in High Performance Computing

The OpenGATE Collaboration

CS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST

High-Performance Computing Using GPUs

Tesla GPU Computing A Revolution in High Performance Computing

Image-based Monte Carlo calculations for dosimetry

Turing Architecture and CUDA 10 New Features. Minseok Lee, Developer Technology Engineer, NVIDIA

Motivation Hardware Overview Programming model. GPU computing. Part 1: General introduction. Ch. Hoelbling. Wuppertal University

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

ATLAS Tracking Detector Upgrade studies using the Fast Simulation Engine

CS427 Multicore Architecture and Parallel Computing

An approach to calculate and visualize intraoperative scattered radiation exposure

Technology for a better society. hetcomp.com

GPU-accelerated ray-tracing for real-time treatment planning

GPU applications in Cancer Radiation Therapy at UCSD. Steve Jiang, UCSD Radiation Oncology Amit Majumdar, SDSC Dongju (DJ) Choi, SDSC

Practical Introduction to CUDA and GPU

HPC with Multicore and GPUs

Tutorial: Parallel programming technologies on hybrid architectures HybriLIT Team

GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS

A fast and accurate GPU-based proton transport Monte Carlo simulation for validating proton therapy treatment plans

In-Situ Statistical Analysis of Autotune Simulation Data using Graphical Processing Units

Advanced CUDA Optimization 1. Introduction

CUDA Programming Model

CUDA 7.5 OVERVIEW WEBINAR 7/23/15

VSC Users Day 2018 Start to GPU Ehsan Moravveji

The new version of the GATE simulation platform

GPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC

GPGPU. Peter Laurens 1st-year PhD Student, NSC

CUDA Toolkit 5.0 Performance Report. January 2013

Using OpenACC With CUDA Libraries

GPU A rchitectures Architectures Patrick Neill May

GPU Basics. Introduction to GPU. S. Sundar and M. Panchatcharam. GPU Basics. S. Sundar & M. Panchatcharam. Super Computing GPU.

Measurement of depth-dose of linear accelerator and simulation by use of Geant4 computer code

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

Multi-Processors and GPU

CUDA Optimization with NVIDIA Nsight Visual Studio Edition 3.0. Julien Demouth, NVIDIA

CSC573: TSHA Introduction to Accelerators

Comparison of High-Speed Ray Casting on GPU

GPU programming. Dr. Bernhard Kainz

GPU Computing Past, Present, Future. Ian Buck, GM GPU Computing Sw

CUDA- based Geant4 Monte Carlo Simula8on for Radia8on Therapy. N. Henderson & K. Murakami GTC 2013

high performance medical reconstruction using stream programming paradigms

Mathematical methods and simulations tools useful in medical radiation physics

Georgia Institute of Technology, August 17, Justin W. L. Wan. Canada Research Chair in Scientific Computing

CUDA (Compute Unified Device Architecture)

NVIDIA s Compute Unified Device Architecture (CUDA)

NVIDIA s Compute Unified Device Architecture (CUDA)

Pat Hanrahan. Modern Graphics Pipeline. How Powerful are GPUs? Application. Command. Geometry. Rasterization. Fragment. Display.

From Biological Cells to Populations of Individuals: Complex Systems Simulations with CUDA (S5133)

Towards a Performance- Portable FFT Library for Heterogeneous Computing

The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System

GPU-Accelerated Deep Learning

CUDA math libraries APC

DEEP NEURAL NETWORKS AND GPUS. Julie Bernauer

Arezoo Modiri Department of Radiation Oncology University of Maryland, Baltimore

Tesla Architecture, CUDA and Optimization Strategies

DIFFERENTIAL. Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka

REAL-TIME ADAPTIVITY IN HEAD-AND-NECK AND LUNG CANCER RADIOTHERAPY IN A GPU ENVIRONMENT

Accelerating GATE simulations

THE LEADER IN VISUAL COMPUTING

Complex Systems Simulations on the GPU

Basics of treatment planning II

Introduction to Numerical General Purpose GPU Computing with NVIDIA CUDA. Part 1: Hardware design and programming model

Current Trends in Computer Graphics Hardware

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

Introduction to CUDA C/C++ Mark Ebersole, NVIDIA CUDA Educator

Ian Buck, GM GPU Computing Software

Future Directions for CUDA Presented by Robert Strzodka

From Brook to CUDA. GPU Technology Conference

radiotherapy Andrew Godley, Ergun Ahunbay, Cheng Peng, and X. Allen Li NCAAPM Spring Meeting 2010 Madison, WI

Introduction to CUDA

CMSC 858M/AMSC 698R. Fast Multipole Methods. Nail A. Gumerov & Ramani Duraiswami. Lecture 20. Outline

MPEXS benchmark results

CUDA 6.0 Performance Report. April 2014

S CUDA on Xavier

Massively Parallel Computing with CUDA. Carlos Alberto Martínez Angeles Cinvestav-IPN

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

CUDA Architecture & Programming Model

Abstract. Introduction. Kevin Todisco

Large scale Imaging on Current Many- Core Platforms

Introduction to GPU Computing Using CUDA. Spring 2014 Westgid Seminar Series

Deep Learning: Transforming Engineering and Science The MathWorks, Inc.

GPU ARCHITECTURE Chris Schultz, June 2017

Accelerating koblinger's method of compton scattering on GPU

COMP 605: Introduction to Parallel Computing Lecture : GPU Architecture

Fundamental Optimizations in CUDA Peng Wang, Developer Technology, NVIDIA

Institute of Cardiovascular Science, UCL Centre for Cardiovascular Imaging, London, United Kingdom, 2

Threading Hardware in G80

Using OpenACC With CUDA Libraries

Technische Universität München. GPU Programming. Rüdiger Westermann Chair for Computer Graphics & Visualization. Faculty of Informatics

Breaking Through the Barriers to GPU Accelerated Monte Carlo Particle Transport

CSE 591: GPU Programming. Programmer Interface. Klaus Mueller. Computer Science Department Stony Brook University

Performance potential for simulating spin models on GPU

Transcription:

GPU-based high-performance computing for radiotherapy applications Julien Bert, PhD CHRU de Brest LaTIM INSERM UMR1101

Radiotherapy Irradiation of the tumor: Maximum dose to the tumor Healthy surrounding organs spared Radiotherapy Introduction Aims: Predict the dose Personalized planning Ensure quality and safety Accurate and realistic numerical simulation in radiation physics External Beam Radiotherapy Intra-Operative Radiotherapy 1

Monte Carlo method Estimated the value of π Monte Carlo simulation Method Uniformly scatter some objects of uniform size (rice or sand) over the square Monte Carlo Simulation Random sample must follow the laws of physics Simulate particle interactions with matter Monte Carlo Simulation 2

Monte Carlo simulation Clinical context Monte Carlo simulation Very computationally demanding limits their applicability in both research and clinical environment applications Computer cluster financial burden and availability issues Cost! Patient Workflow Workspace Intervention cost Space?! h-gate (2010)! 3

Computational power Monte Carlo simulation GPU 4 graphics cards x2 GPUs 1 249 processors 2 with x6 cores (1494 cores) =eq. 1 GPU core: 4 TFLOPS 1 CPU core: 20 GFL0PS High computational power at a reduce cost (1/20) 1 NVIDIA GTX Titan Z = 8.1 TFLOPS 2 Intel Core i7 980 x6 3.33 GHz = 130 GFLOPS 4

Programming language Monte Carlo simulation Parallel programming Writing a non-graphics program with a graphics API Architecture Algorithms 2003 Cg NVIDIA (Mark et al. ACM Trans. Graph.) 2004 Brook ATI (Buck et al. SIGGRAPH) 2004 OpenGL shading language (Kessenich et al. http://www.opengl.org) 2007 CUDA NVIDIA (Compute Unified Device Architecture) 5

Parallel programming Paradigm Monte Carlo simulation on GPU Paradigm one thread per history particles are independent easy to parallelize electron electron Tracking Emission End of simulation photon electron...! Particles buffer Thousands of particles are simulated in parallel 6

Divergence If statement Parallel programming Divergence Threads Begin If A photon B electron End 7

Physics Model Parallel programming Physics model Conseil européen pour la recherche nucléaire (CERN, Genève, Suisse) CERN: LHC, ATLAS Fermilab: MINOS SLAC: BaBar Geant4 code on GPU (C++! C! CUDA) 8

Types of memory Parallel programming Types of memory Storage type Register Shared memory Texture memory Constant memory Global memory Bandwith ~8 TB/s ~1.5 TB/s ~200 MB/s ~200 MB/s ~200 MB/s Latency 1 cycle 1 to 32 cycles ~400 to 600 ~400 to 600 ~400 to 600 Lifetime Function kernel application application application Range Thread Block All threads All threads All threads Size Very small 49kB / MP Depends on the card 64 kb Depends on the card Read/ Write R/W R/W R R R/W Cached No N/A Yes Yes No limited variables! geometry and physical tables! physical constants! particles! 9

Parallel programming PRNG Pseudo Random Generator Number (PRNG) Return a random number uniformly distributed between 0 and 1 PRNG period Seed 0 (initial) Seed 1 Seed 2 Random Random Random number number number Need to store generator state PRNG! PRNG! PRNG! PRNG! Find a suitable algorithm: - small state size - large period 10

Parallel programming Simulation loop Simulation loop Source kernel... Particles buffer Main simulation loop Simulation Kernel (all steps) 14 billions of particles... No Total number of particles requested? Yes exit ~ 2-4 GB 11

Data structure Parallel programming Data structure Array of Structure (AoS) struct Point { float x; float y; float z; }; list of points regular access is not optimized Particle data structure Energy Direction Position Structure of Array (SoA) struct Point { float *x; float *y; float *z; }; Point contains a list of x, y, z values coalesced access is optimized 12

Validation Parallel programming Validation Monte Carlo simulation on GPU (2012): Pseudo random number generator Electromagnetic effects for photon " Compton scattering " Rayleigh scattering " Photoelectric effect Voxelized geometry navigation Materials properties Single precision (float number) 2013 - Full agreement between GPU code and Geant4 (CPU) safety System (algorithms) robustness real-time IOP PUBLISHING Phys. Med. Biol. 58 (2013) 5593 5611 PHYSICS IN MEDICINE AND BIOLOGY doi:10.1088/0031-9155/58/16/5593 Geant4-based Monte Carlo simulations on GPU for medical applications Julien Bert 1,5,HectorPerez-Ponce 2,5,ZiadElBitar 3,Sébastien Jan 4, Yannick Boursier 2,DamienVintache 3,AlainBonissent 2, Christian Morel 2,DavidBrasse 3 and Dimitris Visvikis 1 1 LaTIM, UMR 1101 INSERM, CHRU Brest, Brest, France 2 CPPM, Aix-Marseille Université, CNRS/IN2P3, Marseille, France 3 IPHC, UMR 7178 CNRS/IN2P3, Strasbourg, France 4 DSV/I2BM/SHFJ, Commissariat àl EnergieAtomique,Orsay,France 13

Bugs Parallel programming Bug hunting Tracking bugs: printf (device code, cuda 3.1 2010) Nsight debugger Tracking device bugs Embedded system device host void MyFunction() CPU GPU 14

Science applications Parallel programming Science applications Floating point IEEE 754 (single precision) Sign! Exponent! Significand! Double precision available on GPU: slower computation Navigation (single precision) Dose deposition (double precision)?!?! de! de! de! de! Sum! very small (de) + very high (Tot E) = no change (Tot E)! Total energy deposited! 15

Clinical context Parallel programming Clinical context 2015 - Radiotherapy (brachytherapy) Radiotherapy (external beam) dosimetry in 2 s Phys. Med. Biol. 60 (2015) 4987 5006 Institute of Physics and Engineering in Medicine Physics in Medicine & Biology doi:10.1088/0031-9155/60/13/4987 GGEMS-Brachy: GPU GEant4-based Monte Carlo simulation for brachytherapy applications Yannick Lemaréchal 1, Julien Bert 1, Claire Falconnet 2, Philippe Després 3,4, Antoine Valeri 4,5, Ulrike Schick 1,2, Olivier Pradier 1,2, Marie-Paule Garcia 1, Nicolas Boussion 1,2 and Dimitris Visvikis 1 16

Parallel programming GGEMS GGEMS GPU GEant4-based Monte Carlo Simulations http://www.ggems.fr Intra-Operative Radiotherapy GGEMS is a library not a software. External Beam Radiotherapy Medical Imaging x80-x150 17

GGEMS Heterogeneous architecture Heterogeneous architecture GTX580 cc 2.0 512 cores (Fermi) GTX680 cc 3.0 1536 cores (Kepler) No Yes GTX Titan cc 3.5 2688 cores (Kepler) 37e Forum ORAP march 2016 18

Heterogeneous architecture GGEMS Heterogeneous architecture CUDAHOME := /usr/local/cuda/lib64 CUDALIBS := -L$(CUDAHOME)/lib -lcudart CUDAFLAGS := -gencode=arch=compute_30,code=sm_30 \ -gencode=arch=compute_32,code=sm_32 \ -gencode=arch=compute_35,code=sm_35 \ -gencode=arch=compute_37,code=sm_37 \ -gencode=arch=compute_50,code=sm_50 \ -gencode=arch=compute_52,code=sm_52 \ --compiler-options -w --std c++11 19

Heterogeneous architecture Hardware evolution: GGEMS Hardware evolution Current arch New arch Plug, compile and play Hardware upgrade have a cost!! CUDA, new features: updating the code using new graphics cards Instantly speed up your application 20

Conclusion Conclusion Main CUDA code is similar to the original one (C++) Architecture Algorithms Programming model within this context! 21

Conclusion Conclusion GPU-Accelerated Libraries cufft Fast Fourier Transforms Library cublas Complete BLAS library cusparse Sparse Matrix library curand Random Number Generator NPP Thousands of Performance Primitives for Image & Video Processing Thrust Templated Parallel Algorithms & Data Structures CUDA Math Library of high performance math routines PhysX scalable multi-platform game physics solution VisualFX solutions for complex, cinematic visual effects OptiX Fast ray tracing engine 22

Conclusion Conclusion Specifications: Developer friendly: - debugging - thread exception error 23

Conclusion Conclusion Personal Computer Computer? Personal Super Computer Cluster 24

Conclusion Conclusion Computer assisted medical intervention Image-guide radiotherapy Treatment planning 25

Questions? Julien Bert, PhD CHRU de Brest LaTIM INSERM UMR1101