GPU-accelerated ray-tracing for real-time treatment planning

Size: px
Start display at page:

Download "GPU-accelerated ray-tracing for real-time treatment planning"

Transcription

1 Journal of Physics: Conference Series OPEN ACCESS GPU-accelerated ray-tracing for real-time treatment planning To cite this article: H Heinrich et al 2014 J. Phys.: Conf. Ser View the article online for updates and enhancements. Related content - 3D Scientific Visualization with Blender : Introduction B R Kent - GPU-accelerated track reconstruction in the ALICE High Level Trigger David Rohr, Sergey Gorbunov, Volker Lindenstruth et al. - GPU-accelerated 3D phase-field simulations of dendrite competitive growth during directional solidification of binary alloy S Sakane, T Takaki, M Ohno et al. This content was downloaded from IP address on 17/07/2018 at 21:46

2 GPU-accelerated ray-tracing for real-time treatment planning H Heinrich 1, P Ziegenhein 1, C P Kamerling 1, H Froening 2 and U Oelfke 1 1 German Cancer Research Center (DKFZ), Heidelberg Germany 2 Institute of Computer Engineering, University of Heidelberg, Heidelberg Germany h.heinrich@dkfz.de Abstract. Dose calculation methods in radiotherapy treatment planning require the radiological depth information of the voxels that represent the patient volume to correct for tissue inhomogeneities. This information is acquired by time consuming ray-tracingbased calculations. For treatment planning scenarios with changing geometries and real-time constraints this is a severe bottleneck. We implemented an algorithm for the graphics processing unit (GPU) which implements a ray-matrix approach to reduce the number of rays to trace. Furthermore, we investigated the impact of different strategies of accessing memory in kernel implementations as well as strategies for rapid data transfers between main memory and memory of the graphics device. Our study included the overlapping of computations and memory transfers to reduce the overall runtime using Hyper-Q. We tested our approach on a prostate case (9 beams, coplanar). The measured execution times for a complete ray-tracing range from 28 msec for the computations on the GPU to 99 msec when considering data transfers to and from the graphics device. Our GPU-based algorithm performed the ray-tracing in real-time. The strategies efficiently reduce the time consumption of memory accesses and data transfer overhead. The achieved runtimes demonstrate the viability of this approach and allow improved real-time performance for dose calculation methods in clinical routine. 1. Introduction The computation of the radiological depth data is a vital preprocessing step for radiotherapy treatment planning dose calculations. Since the depth data represents the patient geometry, dose algorithms can use it to correct for tissue inhomogeneities along incident rays. The computation of the radiological depth data relies on time consuming ray-tracing operations typically carried out for every voxel of the patient volume. In adaptive radiotherapy (ART) [1][2][3] treatment planning scenarios with changing patient geometries, this is a severe bottleneck. Our group investigates a new IMRT treatment planning paradigm (interactive dose shaping) which requires rapid access to radiological depth information for changing patient geometries [4][5]. GPUs are powerful high core count devices that are no longer solely used for graphics applications, but also for general-purpose computing tasks. Compared to CPUs, their computing cores are weaker but the massive amount of these cores results in a significant performance increase. Also, GPUs offer memory bandwidth that is about one order of magnitude higher compared to single-socket CPUs. However, GPUs only excel in performance if (1) their vast Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. Published under licence by IOP Publishing Ltd 1

3 amount of computing cores can be kept busy, and (2) if they can perform the computations incore, i.e. if the input and output data associated with the computation is stored in the on-device memory. Otherwise, performance can degrade due to data movements over the bandwidthlimited PCIe interface. In this work we implemented a GPU-based ray-tracing algorithm using CUDA [6]. We investigated the impact of different strategies to increase the memory access efficiency with respect to the runtime. This includes the potentially inefficient memory access patterns of the ray-tracing kernel functions to the on-device graphics memory as well as the overhead resulting from data transfers to and from the host s main memory. In addition we investigated the benefits of using GPU capabilities for concurrent kernel execution and data movement. Results are obtained for a prostate patient data set. 2. Materials & Methods 2.1. The ray-tracing algorithm Our ray-tracing implementation is based on the algorithm proposed by Siddon [7], which was adapted for the GPU by utilizing the stepping approach in [8] and the ray-matrix approach in [9]. A ray-matrix is defined at the iso-center of the treatment field perpendicular to each incident beam. The size of the matrix is defined by the extension of the treatment field plus a scattering margin on each side of a beam. Each matrix element represents a point in 3D space. Rays are defined from the beam source, through each matrix element and tracing is performed until a ray exits the patient. The number of points is determined so that the distance of two neighboring rays at the patient s exit plane is equal to the voxel size. The ray-matrix approach can greatly reduce the number of ray-tracing processes in comparison to tracing every voxel of the patient volume for each beam. In addition, every ray can be traced independently from every other ray, which fits well to the CUDA programming model and allowed us to decomposed the problem so that one thread processes the ray-tracing along one ray. For each beam we execute one kernel which spawns a thread for each point in the corresponding ray-matrix Time-to-solution We use time-to-solution (TTS) as metric to evaluate the performance of our implementation. We define TTS as the accumulated runtimes of the ray-tracing kernel functions on the GPU, the required input data transfer to the graphics device and the transfer of the computed radiological depth data cubes back to the host s main memory as illustrated in Figure 1. Figure 1. Definition of time-to-solution. 2

4 2.3. Acceleration strategies GPUs employ an explicit memory hierarchy that is exposed to the user using NVidia s CUDA framework. Accessing data from e.g. global or texture memory has an impact on the utilization of the computational resource of the graphics hardware. The global memory is the main memory of NVidia graphics devices. It is cached (although not to reduce access latency, but to reduce contention) and reaches best performance for coalesced access patterns, i.e. when threads access consecutive memory addresses. In contrast, texture memory is optimized for accessing grid data. It reaches best performance for access patterns based on spatial locality. Our initial implementation of the ray-tracing kernel, the global memory kernel (GMK), accesses the input data stored in global memory. As acceleration strategy (1) we implemented a second version, the texture memory kernel (TMK), which accesses the input data provided as a 3D texture. This strategy focuses on the data transfer from graphics memory and computational units. The input and output data for the both the GMK and TMK kernel implementations is transferred to and from paged host memory. This can be a potential bottleneck as demand-paging might swap pages out, decreasing access times. To avoid this, the CUDA driver has to copy the data from a paged memory pointer to pinned memory pointer before it can invoke the Host to device (H2D) and device to host (D2H) data transfers using direct memory access (DMA). Allocating the host memory as pinned or page-lock memory permits demand-paging and therefore enables the GPU to access the host s main memory directly using DMA without copy overhead. Strategy (2) focuses on the data transfer between the host s main memory and the device memory. We enhanced our implementation for both, GMK and TMK kernel by using page-lock memory on the host side. The expected benefit is an approximately 2x higher memory bandwidth for the input/output data transfers and thus a reduction of the TTS. Table 1. Overview of strategies to reduce the TTS. Strategy (1) Enhance data transfer between graphics GMK vs. TMK both implemented memory and computational units using paged host memory Strategy (2) Enhance data transfer between host s Paged vs. Page-lock memory main memory and graphics memory implemented for both GMK and TMK Strategy (3) Enhance concurrency of data transfers Hyper-Q implemented and computation for both GMK and TMK Strategy (3) focuses on overlapping computational workloads and data transfers. This can be accomplished using Hyper-Q, a novel feature introduced with CUDA 5.0, available on NVidia s Tesla K20 graphics hardware of the current Kepler architecture [10]. To utilize Hyper-Q, data transfers as well as kernel invocation are required to be implemented for asynchronous execution, i.e. that the control flow returns immediately to the CPU after invocation. Dependencies between asynchronous invocations can be expressed with streams, which are firstin-first-out queues of work packets. Separate streams can be processed concurrently by the graphics hardware if sufficient resources are available. The actual scheduling of the different computational tasks and data transfers is carried out transparently to the user. We enhanced our implementation with asynchronous data transfers and kernel invocations and organized independent tasks in separate streams to exploit Hyper-Q. It is expected that computation 3

5 and memory transfers of multiple beams overlap which can contribute to reduction in TTS. Table 1 provides a quick overview of the different strategies and versions of the ray-tracing implementation Development & test environment The proposed acceleration strategies were implemented in CUDA 5.0 and C/C++ using MS VisualStudio We tested the implementations on a Win7 workstation equipped with an Intel Xeon E5 CPU with 64 GByte RAM and an NVidia Tesla K20c with 5 GByte graphics memory. We used a IMRT treatment plan for a prostate patient with 9 coplanar beams and 256 x 256 x 234 voxels of size (2.62 mm) 3 as test data set. The ray-matrix size for each of the 9 beams includes the beam extension plus an 8 cm scattering margin on each side. The ray-matrix dimensions range 140 x 140 to 186 x 186 points. To test the accuracy of our GPU implementations used the serial CPU implementation of the ray-tracing algorithm used in [9] as reference. 3. Results The results of our GPU implementations were compared to the reference implementation. Both produce equal results within single precision accuracy. The runtimes measured for the different versions of our ray-tracing implementation are depicted in Table 2. The columns 3 to 5 present the runtimes in msec for the Kernel execution on the GPU, the data transfer times H2D and D2H and the TTS. The runtimes are accumulated over 9 kernel invocations, a single data transfer of the input volume and the 9 data transfers of output volumes. For the Hyper-Q versions though, the kernel execution time represents the time from the first kernel start until the last kernel finished. Table 2. The runtimes for the ray-tracing implementations in msec for a 9 beam prostate case. The presented runtimes are accumulated runtimes over 9 beams. Rutimes [msec] Kernel Kernel Data transfer Time to 9 Beams Execution H2D / D2H Solution Paged Memory Page-lock Memory Hyper-Q GMK / TMK / GMK / TMK / GMK / TMK / A snippet of the output of NVidia s Visual Profiler normalized to an identical time line is shown in Figure 2. It allows an intuitive visual access to the work flow on the GPU and the impact of the implemented acceleration strategies. The golden boxes represent the time consumption of data transfers between host and graphics hardware. The aqua blue boxes show the runtime of the GMK and the purple boxes represent the TMK. The first golden box depicts the single input data transfer to the graphics device and the remaining ones account for the transfers of the output cubes to the host s main memory, one cube for each beam respectively. 4

6 Figure 2. NVidia Visual Profiler output for a 9 beam prostate case. The colored boxes represent runtimes for the different version of the ray-tracing implementations: aqua blue GMK, purple TMK, the respective first golden box the input data transfers and the remaining golden boxes the output data transfers to the host s main memory. 4. Discussion We investigated three acceleration strategies for enhancing ray-tracing performance for the radiological depth computation for real-time treatment planning. The application of the strategies can enhance the efficiency of the IMRT treatment planning process in particular when changing patient geometries are considered. We defined the TTS metric to assess the benefit of offloading the ray-tracing to the graphic device for our in-house TPS framework [4][5]. However, many other applications require the results back in the host s memory and can benefit from the proposed strategies. Our first strategy (1) focuses on the impact of using texture memory as opposed to global memory for the ray-tracing kernel implementations. The TMK showed a better performance, since the ray-tracing memory access patterns are based rather on spatial locality than temporal locality. This difference is especially prominent for beam angles which require data access patterns not mapping well to caching strategies based on locality of reference. The profiler output shows reduced sizes of the purple boxes in comparison to the aqua blue boxes and thus indicates the TMK performance to be rather independent from the beam angle. For strategy (2), we enhanced the implementations by using page-lock host memory with both kernels. The performance gain is shown in the profiler output by reduced sizes of the golden boxes. The memory bandwidth observed was stable above 6 Gbyte/sec across the PCI-e interface which amounts to approximately a factor of 2 versus the implementations using paged host memory. 5

7 The final strategy (3) leads to the largest reduction in TTS. It includes overlapping kernel executions and output data transfers back to the host using Hyper-Q. To find that both concurrent implementations for GMK and TMK achieved the same TTS was not expected. It indicates that the rather inefficient memory accesses using global memory implemented in the GMK were efficiently hidden using Hyper-Q. Combining the strategies, we were able to compute the ray-tracing related tasks in 28 msec for 9 treatment beams in clinical resolution using graphics hardware of NVidia s current Kepler architecture. We exploited its capability of concurrent kernel execution and data movement. We obtained runtimes of 99 msec which corresponds to a speed-up of 3 in TTS in comparison to the initial implementation of GMK and TMK using paged host memory. Future work will include the integration of multi GPU support to study the scaling behavior of our implementation. Furthermore, we will investigate the application of the GPU-based ray-tracing in adaptive treatment planning scenarios for rapid response to changing patient geometries. 5. Conclusion We have shown that the computation of radiological depth data for each beam and each voxel of a patient volume can be carried out in real-time using graphics hardware. The memory bandwidth between the host and the device is the limiting factor for ray-tracing on the GPU. The presented strategies minimize the time consumption of memory accesses and data movements using novel features of NVidia s Kepler GPU architecture. We found an optimal time-to-solution by combining our texture memory kernel with Hyper-Q. References [1] Yan D, Vicini F, Wong J and Martinez A 1997 Adaptive radiation therapy Phys. Med. Biol [2] Wu C, Jeraj R, Olivera G H and Mackie T R 2002 Re-optimization in adaptive radiotherapy Phys. Med. Biol [3] de la Zerda A, Armbruster B and Xing L 2007 Formulating adaptive radiation therapy (ART) treatment planning into a closed-loop control framework Phys. Med. Biol [4] Ziegenhein P, Kamerling C P, Oelfke U 2013 Interactive Dose Shaping Efficient Strategies for CPU-based Real-Time Treatment Planning J. Phys.: Conf. Ser. presented at ICCR 2013, not yet published [5] Kamerling C P, Ziegenhein P, Heinrich H, Oelfke U 2013 A 3D isodose manipulation tool for interactive dose shaping J. Phys.: Conf. Ser. presented at ICCR 2013, not yet published [6] NVIDIA CUDA Compute Unified Device Architecture Programming Guide 5.0 edition 2012 NVIDIA Corp., Santa Clara, CA [7] Siddon R L 1985 Fast calculation of the exact radiological path for a three-dimensional CT array. Med. Phys [8] de Greef M, Crezee J, van Eijk J C, Pool R, Bel A 2009 Accelerated ray tracing for radiotherapy dose calculations on a GPU Med. Phys. 36(9) [9] Siggel M, Ziegenhein P, Nill S, Oelfke U 2012 Boosting runtime-performance of photon pencil beam algorithms for radiotherapy treatment planning Physica Medica 28(4) [10] Whitepaper 2012 NVIDIA s Next Generation CUDA Compute Architecture: Kepler GK110 NVIDIA Corp., Santa Clara, CA 6

GPU applications in Cancer Radiation Therapy at UCSD. Steve Jiang, UCSD Radiation Oncology Amit Majumdar, SDSC Dongju (DJ) Choi, SDSC

GPU applications in Cancer Radiation Therapy at UCSD. Steve Jiang, UCSD Radiation Oncology Amit Majumdar, SDSC Dongju (DJ) Choi, SDSC GPU applications in Cancer Radiation Therapy at UCSD Steve Jiang, UCSD Radiation Oncology Amit Majumdar, SDSC Dongju (DJ) Choi, SDSC Conventional Radiotherapy SIMULATION: Construciton, Dij Days PLANNING:

More information

State-of-the-Art IGRT

State-of-the-Art IGRT in partnership with State-of-the-Art IGRT Exploring the Potential of High-Precision Dose Delivery and Real-Time Knowledge of the Target Volume Location Antje-Christin Knopf IOP Medical Physics Group Scientific

More information

Evaluation of 3D Gamma index calculation implemented in two commercial dosimetry systems

Evaluation of 3D Gamma index calculation implemented in two commercial dosimetry systems University of Wollongong Research Online Faculty of Engineering and Information Sciences - Papers: Part A Faculty of Engineering and Information Sciences 2015 Evaluation of 3D Gamma index calculation implemented

More information

radiotherapy Andrew Godley, Ergun Ahunbay, Cheng Peng, and X. Allen Li NCAAPM Spring Meeting 2010 Madison, WI

radiotherapy Andrew Godley, Ergun Ahunbay, Cheng Peng, and X. Allen Li NCAAPM Spring Meeting 2010 Madison, WI GPU-Accelerated autosegmentation for adaptive radiotherapy Andrew Godley, Ergun Ahunbay, Cheng Peng, and X. Allen Li agodley@mcw.edu NCAAPM Spring Meeting 2010 Madison, WI Overview Motivation Adaptive

More information

Using a research real-time control interface to go beyond dynamic MLC tracking

Using a research real-time control interface to go beyond dynamic MLC tracking in partnership with Using a research real-time control interface to go beyond dynamic MLC tracking Dr. Simeon Nill Joint Department of Physics at The Institute of Cancer Research and the Royal Marsden

More information

PyCMSXiO: an external interface to script treatment plans for the Elekta CMS XiO treatment planning system

PyCMSXiO: an external interface to script treatment plans for the Elekta CMS XiO treatment planning system Journal of Physics: Conference Series OPEN ACCESS PyCMSXiO: an external interface to script treatment plans for the Elekta CMS XiO treatment planning system To cite this article: Aitang Xing et al 2014

More information

high performance medical reconstruction using stream programming paradigms

high performance medical reconstruction using stream programming paradigms high performance medical reconstruction using stream programming paradigms This Paper describes the implementation and results of CT reconstruction using Filtered Back Projection on various stream programming

More information

Dose Distributions. Purpose. Isodose distributions. To familiarize the resident with dose distributions and the factors that affect them

Dose Distributions. Purpose. Isodose distributions. To familiarize the resident with dose distributions and the factors that affect them Dose Distributions George Starkschall, Ph.D. Department of Radiation Physics U.T. M.D. Anderson Cancer Center Purpose To familiarize the resident with dose distributions and the factors that affect them

More information

Outline. Outline 7/24/2014. Fast, near real-time, Monte Carlo dose calculations using GPU. Xun Jia Ph.D. GPU Monte Carlo. Clinical Applications

Outline. Outline 7/24/2014. Fast, near real-time, Monte Carlo dose calculations using GPU. Xun Jia Ph.D. GPU Monte Carlo. Clinical Applications Fast, near real-time, Monte Carlo dose calculations using GPU Xun Jia Ph.D. xun.jia@utsouthwestern.edu Outline GPU Monte Carlo Clinical Applications Conclusions 2 Outline GPU Monte Carlo Clinical Applications

More information

X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management

X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management Hideyuki Shamoto, Tatsuhiro Chiba, Mikio Takeuchi Tokyo Institute of Technology IBM Research Tokyo Programming for large

More information

GPU-based fast gamma index calcuation

GPU-based fast gamma index calcuation 1 GPU-based fast gamma index calcuation 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Xuejun Gu, Xun Jia, and Steve B. Jiang Center for Advanced Radiotherapy Technologies

More information

gpmc: GPU-Based Monte Carlo Dose Calculation for Proton Radiotherapy Xun Jia 8/7/2013

gpmc: GPU-Based Monte Carlo Dose Calculation for Proton Radiotherapy Xun Jia 8/7/2013 gpmc: GPU-Based Monte Carlo Dose Calculation for Proton Radiotherapy Xun Jia xunjia@ucsd.edu 8/7/2013 gpmc project Proton therapy dose calculation Pencil beam method Monte Carlo method gpmc project Started

More information

Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency

Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency Yijie Huangfu and Wei Zhang Department of Electrical and Computer Engineering Virginia Commonwealth University {huangfuy2,wzhang4}@vcu.edu

More information

Optimization solutions for the segmented sum algorithmic function

Optimization solutions for the segmented sum algorithmic function Optimization solutions for the segmented sum algorithmic function ALEXANDRU PÎRJAN Department of Informatics, Statistics and Mathematics Romanian-American University 1B, Expozitiei Blvd., district 1, code

More information

Dose Calculation and Optimization Algorithms: A Clinical Perspective

Dose Calculation and Optimization Algorithms: A Clinical Perspective Dose Calculation and Optimization Algorithms: A Clinical Perspective Daryl P. Nazareth, PhD Roswell Park Cancer Institute, Buffalo, NY T. Rock Mackie, PhD University of Wisconsin-Madison David Shepard,

More information

Fundamental CUDA Optimization. NVIDIA Corporation

Fundamental CUDA Optimization. NVIDIA Corporation Fundamental CUDA Optimization NVIDIA Corporation Outline Fermi/Kepler Architecture Kernel optimizations Launch configuration Global memory throughput Shared memory access Instruction throughput / control

More information

Comparison of absorbed dose distribution 10 MV photon beam on water phantom using Monte Carlo method and Analytical Anisotropic Algorithm

Comparison of absorbed dose distribution 10 MV photon beam on water phantom using Monte Carlo method and Analytical Anisotropic Algorithm Journal of Physics: Conference Series PAPER OPEN ACCESS Comparison of absorbed dose distribution 1 MV photon beam on water phantom using Monte Carlo method and Analytical Anisotropic Algorithm To cite

More information

Incremental Risk Charge With cufft: A Case Study Of Enabling Multi Dimensional Gain With Few GPUs

Incremental Risk Charge With cufft: A Case Study Of Enabling Multi Dimensional Gain With Few GPUs Incremental Risk Charge With cufft: A Case Study Of Enabling Multi Dimensional Gain With Few GPUs Amit Kalele and Manoj Nambiar April 21, 2014 1 Optimization & Parallelization COE Center of Excellence

More information

REAL-TIME ADAPTIVITY IN HEAD-AND-NECK AND LUNG CANCER RADIOTHERAPY IN A GPU ENVIRONMENT

REAL-TIME ADAPTIVITY IN HEAD-AND-NECK AND LUNG CANCER RADIOTHERAPY IN A GPU ENVIRONMENT REAL-TIME ADAPTIVITY IN HEAD-AND-NECK AND LUNG CANCER RADIOTHERAPY IN A GPU ENVIRONMENT Anand P Santhanam Assistant Professor, Department of Radiation Oncology OUTLINE Adaptive radiotherapy for head and

More information

LUNAR TEMPERATURE CALCULATIONS ON A GPU

LUNAR TEMPERATURE CALCULATIONS ON A GPU LUNAR TEMPERATURE CALCULATIONS ON A GPU Kyle M. Berney Department of Information & Computer Sciences Department of Mathematics University of Hawai i at Mānoa Honolulu, HI 96822 ABSTRACT Lunar surface temperature

More information

Investigation of tilted dose kernels for portal dose prediction in a-si electronic portal imagers

Investigation of tilted dose kernels for portal dose prediction in a-si electronic portal imagers Investigation of tilted dose kernels for portal dose prediction in a-si electronic portal imagers Krista Chytyk MSc student Supervisor: Dr. Boyd McCurdy Introduction The objective of cancer radiotherapy

More information

GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE)

GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE) GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE) NATALIA GIMELSHEIN ANSHUL GUPTA STEVE RENNICH SEID KORIC NVIDIA IBM NVIDIA NCSA WATSON SPARSE MATRIX PACKAGE (WSMP) Cholesky, LDL T, LU factorization

More information

Basics of treatment planning II

Basics of treatment planning II Basics of treatment planning II Sastry Vedam PhD DABR Introduction to Medical Physics III: Therapy Spring 2015 Dose calculation algorithms! Correction based! Model based 1 Dose calculation algorithms!

More information

ACCELERATING THE PRODUCTION OF SYNTHETIC SEISMOGRAMS BY A MULTICORE PROCESSOR CLUSTER WITH MULTIPLE GPUS

ACCELERATING THE PRODUCTION OF SYNTHETIC SEISMOGRAMS BY A MULTICORE PROCESSOR CLUSTER WITH MULTIPLE GPUS ACCELERATING THE PRODUCTION OF SYNTHETIC SEISMOGRAMS BY A MULTICORE PROCESSOR CLUSTER WITH MULTIPLE GPUS Ferdinando Alessi Annalisa Massini Roberto Basili INGV Introduction The simulation of wave propagation

More information

Thrust ++ : Portable, Abstract Library for Medical Imaging Applications

Thrust ++ : Portable, Abstract Library for Medical Imaging Applications Siemens Corporate Technology March, 2015 Thrust ++ : Portable, Abstract Library for Medical Imaging Applications Siemens AG 2015. All rights reserved Agenda Parallel Computing Challenges and Solutions

More information

Parallel FFT Program Optimizations on Heterogeneous Computers

Parallel FFT Program Optimizations on Heterogeneous Computers Parallel FFT Program Optimizations on Heterogeneous Computers Shuo Chen, Xiaoming Li Department of Electrical and Computer Engineering University of Delaware, Newark, DE 19716 Outline Part I: A Hybrid

More information

Data. ModuLeaf Mini Multileaf Collimator Precision Beam Shaping for Advanced Radiotherapy

Data. ModuLeaf Mini Multileaf Collimator Precision Beam Shaping for Advanced Radiotherapy Data ModuLeaf Mini Multileaf Collimator Precision Beam Shaping for Advanced Radiotherapy ModuLeaf Mini Multileaf Collimator Precision Beam Shaping for Advanced Radiotherapy The ModuLeaf Mini Multileaf

More information

GPU acceleration of 3D forward and backward projection using separable footprints for X-ray CT image reconstruction

GPU acceleration of 3D forward and backward projection using separable footprints for X-ray CT image reconstruction GPU acceleration of 3D forward and backward projection using separable footprints for X-ray CT image reconstruction Meng Wu and Jeffrey A. Fessler EECS Department University of Michigan Fully 3D Image

More information

Efficient CPU GPU data transfers CUDA 6.0 Unified Virtual Memory

Efficient CPU GPU data transfers CUDA 6.0 Unified Virtual Memory Institute of Computational Science Efficient CPU GPU data transfers CUDA 6.0 Unified Virtual Memory Juraj Kardoš (University of Lugano) July 9, 2014 Juraj Kardoš Efficient GPU data transfers July 9, 2014

More information

GPU-based finite-size pencil beam algorithm with 3Ddensity correction for radiotherapy dose calculation

GPU-based finite-size pencil beam algorithm with 3Ddensity correction for radiotherapy dose calculation 1 2 GPU-based finite-size pencil beam algorithm with 3Ddensity correction for radiotherapy dose calculation 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 Xuejun

More information

CUDA Optimizations WS Intelligent Robotics Seminar. Universität Hamburg WS Intelligent Robotics Seminar Praveen Kulkarni

CUDA Optimizations WS Intelligent Robotics Seminar. Universität Hamburg WS Intelligent Robotics Seminar Praveen Kulkarni CUDA Optimizations WS 2014-15 Intelligent Robotics Seminar 1 Table of content 1 Background information 2 Optimizations 3 Summary 2 Table of content 1 Background information 2 Optimizations 3 Summary 3

More information

CUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging

CUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging CUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging Saoni Mukherjee, Nicholas Moore, James Brock and Miriam Leeser September 12, 2012 1 Outline Introduction to CT Scan, 3D reconstruction

More information

Evaluation of Asynchronous Offloading Capabilities of Accelerator Programming Models for Multiple Devices

Evaluation of Asynchronous Offloading Capabilities of Accelerator Programming Models for Multiple Devices Evaluation of Asynchronous Offloading Capabilities of Accelerator Programming Models for Multiple Devices Jonas Hahnfeld 1, Christian Terboven 1, James Price 2, Hans Joachim Pflug 1, Matthias S. Müller

More information

An approach to calculate and visualize intraoperative scattered radiation exposure

An approach to calculate and visualize intraoperative scattered radiation exposure Peter L. Reicertz Institut für Medizinische Informatik An approach to calculate and visualize intraoperative scattered radiation exposure Markus Wagner University of Braunschweig Institute of Technology

More information

TR An Overview of NVIDIA Tegra K1 Architecture. Ang Li, Radu Serban, Dan Negrut

TR An Overview of NVIDIA Tegra K1 Architecture. Ang Li, Radu Serban, Dan Negrut TR-2014-17 An Overview of NVIDIA Tegra K1 Architecture Ang Li, Radu Serban, Dan Negrut November 20, 2014 Abstract This paperwork gives an overview of NVIDIA s Jetson TK1 Development Kit and its Tegra K1

More information

GPU-based Fast Cone Beam CT Reconstruction from Undersampled and Noisy Projection Data via Total Variation

GPU-based Fast Cone Beam CT Reconstruction from Undersampled and Noisy Projection Data via Total Variation GPU-based Fast Cone Beam CT Reconstruction from Undersampled and Noisy Projection Data via Total Variation 5 10 15 20 25 30 35 Xun Jia Department of Radiation Oncology, University of California San Diego,

More information

Multi-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation

Multi-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation Multi-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation 1 Cheng-Han Du* I-Hsin Chung** Weichung Wang* * I n s t i t u t e o f A p p l i e d M

More information

Profiling of Data-Parallel Processors

Profiling of Data-Parallel Processors Profiling of Data-Parallel Processors Daniel Kruck 09/02/2014 09/02/2014 Profiling Daniel Kruck 1 / 41 Outline 1 Motivation 2 Background - GPUs 3 Profiler NVIDIA Tools Lynx 4 Optimizations 5 Conclusion

More information

Optimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink

Optimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink Optimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink Rajesh Bordawekar IBM T. J. Watson Research Center bordaw@us.ibm.com Pidad D Souza IBM Systems pidsouza@in.ibm.com 1 Outline

More information

Arezoo Modiri Department of Radiation Oncology University of Maryland, Baltimore

Arezoo Modiri Department of Radiation Oncology University of Maryland, Baltimore Photon Optimization with GPU and Multi- Core CPU; What are the issues?, PhD Parallelization CPUs/Clusters/Cloud/GPUs Data management Outline Computation-Intensive Applications in Photon Radiotherapy Dose

More information

A Fast GPU-Based Approach to Branchless Distance-Driven Projection and Back-Projection in Cone Beam CT

A Fast GPU-Based Approach to Branchless Distance-Driven Projection and Back-Projection in Cone Beam CT A Fast GPU-Based Approach to Branchless Distance-Driven Projection and Back-Projection in Cone Beam CT Daniel Schlifske ab and Henry Medeiros a a Marquette University, 1250 W Wisconsin Ave, Milwaukee,

More information

Towards Breast Anatomy Simulation Using GPUs

Towards Breast Anatomy Simulation Using GPUs Towards Breast Anatomy Simulation Using GPUs Joseph H. Chui 1, David D. Pokrajac 2, Andrew D.A. Maidment 3, and Predrag R. Bakic 4 1 Department of Radiology, University of Pennsylvania, Philadelphia PA

More information

CUDA. Matthew Joyner, Jeremy Williams

CUDA. Matthew Joyner, Jeremy Williams CUDA Matthew Joyner, Jeremy Williams Agenda What is CUDA? CUDA GPU Architecture CPU/GPU Communication Coding in CUDA Use cases of CUDA Comparison to OpenCL What is CUDA? What is CUDA? CUDA is a parallel

More information

HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes.

HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes. HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes Ian Glendinning Outline NVIDIA GPU cards CUDA & OpenCL Parallel Implementation

More information

Improvement and Evaluation of a Time-of-Flight-based Patient Positioning System

Improvement and Evaluation of a Time-of-Flight-based Patient Positioning System Improvement and Evaluation of a Time-of-Flight-based Patient Positioning System Simon Placht, Christian Schaller, Michael Balda, André Adelt, Christian Ulrich, Joachim Hornegger Pattern Recognition Lab,

More information

What is GPU? CS 590: High Performance Computing. GPU Architectures and CUDA Concepts/Terms

What is GPU? CS 590: High Performance Computing. GPU Architectures and CUDA Concepts/Terms CS 590: High Performance Computing GPU Architectures and CUDA Concepts/Terms Fengguang Song Department of Computer & Information Science IUPUI What is GPU? Conventional GPUs are used to generate 2D, 3D

More information

Georgia Institute of Technology, August 17, Justin W. L. Wan. Canada Research Chair in Scientific Computing

Georgia Institute of Technology, August 17, Justin W. L. Wan. Canada Research Chair in Scientific Computing Real-Time Rigid id 2D-3D Medical Image Registration ti Using RapidMind Multi-Core Platform Georgia Tech/AFRL Workshop on Computational Science Challenge Using Emerging & Massively Parallel Computer Architectures

More information

ELECTRON DOSE KERNELS TO ACCOUNT FOR SECONDARY PARTICLE TRANSPORT IN DETERMINISTIC SIMULATIONS

ELECTRON DOSE KERNELS TO ACCOUNT FOR SECONDARY PARTICLE TRANSPORT IN DETERMINISTIC SIMULATIONS Computational Medical Physics Working Group Workshop II, Sep 30 Oct 3, 2007 University of Florida (UF), Gainesville, Florida USA on CD-ROM, American Nuclear Society, LaGrange Park, IL (2007) ELECTRON DOSE

More information

TomoTherapy Related Projects. An image guidance alternative on Tomo Low dose MVCT reconstruction Patient Quality Assurance using Sinogram

TomoTherapy Related Projects. An image guidance alternative on Tomo Low dose MVCT reconstruction Patient Quality Assurance using Sinogram TomoTherapy Related Projects An image guidance alternative on Tomo Low dose MVCT reconstruction Patient Quality Assurance using Sinogram Development of A Novel Image Guidance Alternative for Patient Localization

More information

Fundamental CUDA Optimization. NVIDIA Corporation

Fundamental CUDA Optimization. NVIDIA Corporation Fundamental CUDA Optimization NVIDIA Corporation Outline! Fermi Architecture! Kernel optimizations! Launch configuration! Global memory throughput! Shared memory access! Instruction throughput / control

More information

Artifact Mitigation in High Energy CT via Monte Carlo Simulation

Artifact Mitigation in High Energy CT via Monte Carlo Simulation PIERS ONLINE, VOL. 7, NO. 8, 11 791 Artifact Mitigation in High Energy CT via Monte Carlo Simulation Xuemin Jin and Robert Y. Levine Spectral Sciences, Inc., USA Abstract The high energy (< 15 MeV) incident

More information

TUNING CUDA APPLICATIONS FOR MAXWELL

TUNING CUDA APPLICATIONS FOR MAXWELL TUNING CUDA APPLICATIONS FOR MAXWELL DA-07173-001_v6.5 August 2014 Application Note TABLE OF CONTENTS Chapter 1. Maxwell Tuning Guide... 1 1.1. NVIDIA Maxwell Compute Architecture... 1 1.2. CUDA Best Practices...2

More information

CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS

CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS 1 Last time Each block is assigned to and executed on a single streaming multiprocessor (SM). Threads execute in groups of 32 called warps. Threads in

More information

Introduction to GPU hardware and to CUDA

Introduction to GPU hardware and to CUDA Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 35 Course outline Introduction to GPU hardware

More information

Shell: Accelerating Ray Tracing on GPU

Shell: Accelerating Ray Tracing on GPU Shell: Accelerating Ray Tracing on GPU Kai Xiao 1, Bo Zhou 2, X.Sharon Hu 1, and Danny Z. Chen 1 1 Department of Computer Science and Engineering, University of Notre Dame 2 Department of Radiation Oncology,

More information

implementation using GPU architecture is implemented only from the viewpoint of frame level parallel encoding [6]. However, it is obvious that the mot

implementation using GPU architecture is implemented only from the viewpoint of frame level parallel encoding [6]. However, it is obvious that the mot Parallel Implementation Algorithm of Motion Estimation for GPU Applications by Tian Song 1,2*, Masashi Koshino 2, Yuya Matsunohana 2 and Takashi Shimamoto 1,2 Abstract The video coding standard H.264/AVC

More information

Advanced CUDA Optimization 1. Introduction

Advanced CUDA Optimization 1. Introduction Advanced CUDA Optimization 1. Introduction Thomas Bradley Agenda CUDA Review Review of CUDA Architecture Programming & Memory Models Programming Environment Execution Performance Optimization Guidelines

More information

GPU-Based Acceleration for CT Image Reconstruction

GPU-Based Acceleration for CT Image Reconstruction GPU-Based Acceleration for CT Image Reconstruction Xiaodong Yu Advisor: Wu-chun Feng Collaborators: Guohua Cao, Hao Gong Outline Introduction and Motivation Background Knowledge Challenges and Proposed

More information

3D Registration based on Normalized Mutual Information

3D Registration based on Normalized Mutual Information 3D Registration based on Normalized Mutual Information Performance of CPU vs. GPU Implementation Florian Jung, Stefan Wesarg Interactive Graphics Systems Group (GRIS), TU Darmstadt, Germany stefan.wesarg@gris.tu-darmstadt.de

More information

Monte Carlo methods in proton beam radiation therapy. Harald Paganetti

Monte Carlo methods in proton beam radiation therapy. Harald Paganetti Monte Carlo methods in proton beam radiation therapy Harald Paganetti Introduction: Proton Physics Electromagnetic energy loss of protons Distal distribution Dose [%] 120 100 80 60 40 p e p Ionization

More information

Hybrid Implementation of 3D Kirchhoff Migration

Hybrid Implementation of 3D Kirchhoff Migration Hybrid Implementation of 3D Kirchhoff Migration Max Grossman, Mauricio Araya-Polo, Gladys Gonzalez GTC, San Jose March 19, 2013 Agenda 1. Motivation 2. The Problem at Hand 3. Solution Strategy 4. GPU Implementation

More information

Photon beam dose distributions in 2D

Photon beam dose distributions in 2D Photon beam dose distributions in 2D Sastry Vedam PhD DABR Introduction to Medical Physics III: Therapy Spring 2014 Acknowledgments! Narayan Sahoo PhD! Richard G Lane (Late) PhD 1 Overview! Evaluation

More information

The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System

The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System Alan Humphrey, Qingyu Meng, Martin Berzins Scientific Computing and Imaging Institute & University of Utah I. Uintah Overview

More information

CUDA OPTIMIZATIONS ISC 2011 Tutorial

CUDA OPTIMIZATIONS ISC 2011 Tutorial CUDA OPTIMIZATIONS ISC 2011 Tutorial Tim C. Schroeder, NVIDIA Corporation Outline Kernel optimizations Launch configuration Global memory throughput Shared memory access Instruction throughput / control

More information

CUDA Performance Optimization. Patrick Legresley

CUDA Performance Optimization. Patrick Legresley CUDA Performance Optimization Patrick Legresley Optimizations Kernel optimizations Maximizing global memory throughput Efficient use of shared memory Minimizing divergent warps Intrinsic instructions Optimizations

More information

TUNING CUDA APPLICATIONS FOR MAXWELL

TUNING CUDA APPLICATIONS FOR MAXWELL TUNING CUDA APPLICATIONS FOR MAXWELL DA-07173-001_v7.0 March 2015 Application Note TABLE OF CONTENTS Chapter 1. Maxwell Tuning Guide... 1 1.1. NVIDIA Maxwell Compute Architecture... 1 1.2. CUDA Best Practices...2

More information

Significance of time-dependent geometries for Monte Carlo simulations in radiation therapy. Harald Paganetti

Significance of time-dependent geometries for Monte Carlo simulations in radiation therapy. Harald Paganetti Significance of time-dependent geometries for Monte Carlo simulations in radiation therapy Harald Paganetti Modeling time dependent geometrical setups Key to 4D Monte Carlo: Geometry changes during the

More information

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand

More information

ADVANCING CANCER TREATMENT

ADVANCING CANCER TREATMENT 3 ADVANCING CANCER TREATMENT SUPPORTING CLINICS WORLDWIDE RaySearch is advancing cancer treatment through pioneering software. We believe software has un limited potential, and that it is now the driving

More information

15 Dose Calculation Algorithms

15 Dose Calculation Algorithms Dose Calculation Algorithms 187 15 Dose Calculation Algorithms Uwe Oelfke and Christian Scholz CONTENTS 15.1 Introduction 187 15.2 Model-Based Algorithms 188 15.3 Modeling of the Primary Photon Fluence

More information

Ch. 4 Physical Principles of CT

Ch. 4 Physical Principles of CT Ch. 4 Physical Principles of CT CLRS 408: Intro to CT Department of Radiation Sciences Review: Why CT? Solution for radiography/tomography limitations Superimposition of structures Distinguishing between

More information

arxiv: v1 [physics.ins-det] 11 Jul 2015

arxiv: v1 [physics.ins-det] 11 Jul 2015 GPGPU for track finding in High Energy Physics arxiv:7.374v [physics.ins-det] Jul 5 L Rinaldi, M Belgiovine, R Di Sipio, A Gabrielli, M Negrini, F Semeria, A Sidoti, S A Tupputi 3, M Villa Bologna University

More information

When MPPDB Meets GPU:

When MPPDB Meets GPU: When MPPDB Meets GPU: An Extendible Framework for Acceleration Laura Chen, Le Cai, Yongyan Wang Background: Heterogeneous Computing Hardware Trend stops growing with Moore s Law Fast development of GPU

More information

Accelerated Library Framework for Hybrid-x86

Accelerated Library Framework for Hybrid-x86 Software Development Kit for Multicore Acceleration Version 3.0 Accelerated Library Framework for Hybrid-x86 Programmer s Guide and API Reference Version 1.0 DRAFT SC33-8406-00 Software Development Kit

More information

Duksu Kim. Professional Experience Senior researcher, KISTI High performance visualization

Duksu Kim. Professional Experience Senior researcher, KISTI High performance visualization Duksu Kim Assistant professor, KORATEHC Education Ph.D. Computer Science, KAIST Parallel Proximity Computation on Heterogeneous Computing Systems for Graphics Applications Professional Experience Senior

More information

Using GPUs to compute the multilevel summation of electrostatic forces

Using GPUs to compute the multilevel summation of electrostatic forces Using GPUs to compute the multilevel summation of electrostatic forces David J. Hardy Theoretical and Computational Biophysics Group Beckman Institute for Advanced Science and Technology University of

More information

Monte Carlo Simulation for Neptun 10 PC Medical Linear Accelerator and Calculations of Electron Beam Parameters

Monte Carlo Simulation for Neptun 10 PC Medical Linear Accelerator and Calculations of Electron Beam Parameters Monte Carlo Simulation for Neptun 1 PC Medical Linear Accelerator and Calculations of Electron Beam Parameters M.T. Bahreyni Toossi a, M. Momen Nezhad b, S.M. Hashemi a a Medical Physics Research Center,

More information

GPU Performance Optimisation. Alan Gray EPCC The University of Edinburgh

GPU Performance Optimisation. Alan Gray EPCC The University of Edinburgh GPU Performance Optimisation EPCC The University of Edinburgh Hardware NVIDIA accelerated system: Memory Memory GPU vs CPU: Theoretical Peak capabilities NVIDIA Fermi AMD Magny-Cours (6172) Cores 448 (1.15GHz)

More information

Basic Radiation Oncology Physics

Basic Radiation Oncology Physics Basic Radiation Oncology Physics T. Ganesh, Ph.D., DABR Chief Medical Physicist Fortis Memorial Research Institute Gurgaon Acknowledgment: I gratefully acknowledge the IAEA resources of teaching slides

More information

OpenACC Course. Office Hour #2 Q&A

OpenACC Course. Office Hour #2 Q&A OpenACC Course Office Hour #2 Q&A Q1: How many threads does each GPU core have? A: GPU cores execute arithmetic instructions. Each core can execute one single precision floating point instruction per cycle

More information

An Evaluation of Unified Memory Technology on NVIDIA GPUs

An Evaluation of Unified Memory Technology on NVIDIA GPUs An Evaluation of Unified Memory Technology on NVIDIA GPUs Wenqiang Li 1, Guanghao Jin 2, Xuewen Cui 1, Simon See 1,3 Center for High Performance Computing, Shanghai Jiao Tong University, China 1 Tokyo

More information

Motion artifact detection in four-dimensional computed tomography images

Motion artifact detection in four-dimensional computed tomography images Motion artifact detection in four-dimensional computed tomography images G Bouilhol 1,, M Ayadi, R Pinho, S Rit 1, and D Sarrut 1, 1 University of Lyon, CREATIS; CNRS UMR 5; Inserm U144; INSA-Lyon; University

More information

Building NVLink for Developers

Building NVLink for Developers Building NVLink for Developers Unleashing programmatic, architectural and performance capabilities for accelerated computing Why NVLink TM? Simpler, Better and Faster Simplified Programming No specialized

More information

GPUfs: Integrating a file system with GPUs

GPUfs: Integrating a file system with GPUs GPUfs: Integrating a file system with GPUs Mark Silberstein (UT Austin/Technion) Bryan Ford (Yale), Idit Keidar (Technion) Emmett Witchel (UT Austin) 1 Building systems with GPUs is hard. Why? 2 Goal of

More information

Parallel Approach for Implementing Data Mining Algorithms

Parallel Approach for Implementing Data Mining Algorithms TITLE OF THE THESIS Parallel Approach for Implementing Data Mining Algorithms A RESEARCH PROPOSAL SUBMITTED TO THE SHRI RAMDEOBABA COLLEGE OF ENGINEERING AND MANAGEMENT, FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

More information

REDUCING BEAMFORMING CALCULATION TIME WITH GPU ACCELERATED ALGORITHMS

REDUCING BEAMFORMING CALCULATION TIME WITH GPU ACCELERATED ALGORITHMS BeBeC-2014-08 REDUCING BEAMFORMING CALCULATION TIME WITH GPU ACCELERATED ALGORITHMS Steffen Schmidt GFaI ev Volmerstraße 3, 12489, Berlin, Germany ABSTRACT Beamforming algorithms make high demands on the

More information

Advanced CUDA Optimizations. Umar Arshad ArrayFire

Advanced CUDA Optimizations. Umar Arshad ArrayFire Advanced CUDA Optimizations Umar Arshad (@arshad_umar) ArrayFire (@arrayfire) ArrayFire World s leading GPU experts In the industry since 2007 NVIDIA Partner Deep experience working with thousands of customers

More information

Michael Speiser, Ph.D.

Michael Speiser, Ph.D. IMPROVED CT-BASED VOXEL PHANTOM GENERATION FOR MCNP MONTE CARLO Michael Speiser, Ph.D. Department of Radiation Oncology UT Southwestern Medical Center Dallas, TX September 1 st, 2012 CMPWG Workshop Medical

More information

Simulation of Mammograms & Tomosynthesis imaging with Cone Beam Breast CT images

Simulation of Mammograms & Tomosynthesis imaging with Cone Beam Breast CT images Simulation of Mammograms & Tomosynthesis imaging with Cone Beam Breast CT images Tao Han, Chris C. Shaw, Lingyun Chen, Chao-jen Lai, Xinming Liu, Tianpeng Wang Digital Imaging Research Laboratory (DIRL),

More information

The Case for Heterogeneous HTAP

The Case for Heterogeneous HTAP The Case for Heterogeneous HTAP Raja Appuswamy, Manos Karpathiotakis, Danica Porobic, and Anastasia Ailamaki Data-Intensive Applications and Systems Lab EPFL 1 HTAP the contract with the hardware Hybrid

More information

Deep Scatter Estimation (DSE): Feasibility of using a Deep Convolutional Neural Network for Real-Time X-Ray Scatter Prediction in Cone-Beam CT

Deep Scatter Estimation (DSE): Feasibility of using a Deep Convolutional Neural Network for Real-Time X-Ray Scatter Prediction in Cone-Beam CT Deep Scatter Estimation (DSE): Feasibility of using a Deep Convolutional Neural Network for Real-Time X-Ray Scatter Prediction in Cone-Beam CT Joscha Maier 1,2, Yannick Berker 1, Stefan Sawall 1,2 and

More information

John W. Romein. Netherlands Institute for Radio Astronomy (ASTRON) Dwingeloo, the Netherlands

John W. Romein. Netherlands Institute for Radio Astronomy (ASTRON) Dwingeloo, the Netherlands Signal Processing on GPUs for Radio Telescopes John W. Romein Netherlands Institute for Radio Astronomy (ASTRON) Dwingeloo, the Netherlands 1 Overview radio telescopes six radio telescope algorithms on

More information

GPU Fundamentals Jeff Larkin November 14, 2016

GPU Fundamentals Jeff Larkin November 14, 2016 GPU Fundamentals Jeff Larkin , November 4, 206 Who Am I? 2002 B.S. Computer Science Furman University 2005 M.S. Computer Science UT Knoxville 2002 Graduate Teaching Assistant 2005 Graduate

More information

Improving DPDK Performance

Improving DPDK Performance Improving DPDK Performance Data Plane Development Kit (DPDK) was pioneered by Intel as a way to boost the speed of packet API with standard hardware. DPDK-enabled applications typically show four or more

More information

RT 3D FDTD Simulation of LF and MF Room Acoustics

RT 3D FDTD Simulation of LF and MF Room Acoustics RT 3D FDTD Simulation of LF and MF Room Acoustics ANDREA EMANUELE GRECO Id. 749612 andreaemanuele.greco@mail.polimi.it ADVANCED COMPUTER ARCHITECTURES (A.A. 2010/11) Prof.Ing. Cristina Silvano Dr.Ing.

More information

Portland State University ECE 588/688. Graphics Processors

Portland State University ECE 588/688. Graphics Processors Portland State University ECE 588/688 Graphics Processors Copyright by Alaa Alameldeen 2018 Why Graphics Processors? Graphics programs have different characteristics from general purpose programs Highly

More information

BlackBerry AtHoc Networked Crisis Communication Capacity Planning Guidelines. AtHoc SMS Codes

BlackBerry AtHoc Networked Crisis Communication Capacity Planning Guidelines. AtHoc SMS Codes BlackBerry AtHoc Networked Crisis Communication Capacity Planning Guidelines AtHoc SMS Codes Version Version 7.5, May 1.0, November 2018 2016 1 Copyright 2010 2018 BlackBerry Limited. All Rights Reserved.

More information

Iterative regularization in intensity-modulated radiation therapy optimization. Carlsson, F. and Forsgren, A. Med. Phys. 33 (1), January 2006.

Iterative regularization in intensity-modulated radiation therapy optimization. Carlsson, F. and Forsgren, A. Med. Phys. 33 (1), January 2006. Iterative regularization in intensity-modulated radiation therapy optimization Carlsson, F. and Forsgren, A. Med. Phys. 33 (1), January 2006. 2 / 15 Plan 1 2 3 4 3 / 15 to paper The purpose of the paper

More information

OpenCL Implementation Of A Heterogeneous Computing System For Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data

OpenCL Implementation Of A Heterogeneous Computing System For Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data OpenCL Implementation Of A Heterogeneous Computing System For Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data Andrew Miller Computer Vision Group Research Developer 3-D TERRAIN RECONSTRUCTION

More information

XIV International PhD Workshop OWD 2012, October Optimal structure of face detection algorithm using GPU architecture

XIV International PhD Workshop OWD 2012, October Optimal structure of face detection algorithm using GPU architecture XIV International PhD Workshop OWD 2012, 20 23 October 2012 Optimal structure of face detection algorithm using GPU architecture Dmitry Pertsau, Belarusian State University of Informatics and Radioelectronics

More information