Georgia Institute of Technology, August 17, Justin W. L. Wan. Canada Research Chair in Scientific Computing
|
|
- Collin Greene
- 5 years ago
- Views:
Transcription
1 Real-Time Rigid id 2D-3D Medical Image Registration ti Using RapidMind Multi-Core Platform Georgia Tech/AFRL Workshop on Computational Science Challenge Using Emerging & Massively Parallel Computer Architectures Georgia Institute of Technology, August 17, 2009 Justin W. L. Wan Canada Research Chair in Scientific Computing David R. Cheriton School of Computer Science University of Waterloo Joint work with Lin Xu (Princess Margaret Hospital)
2 Outline Rigid (2D-3D) image registration Modeling and numerical solution Multi-core/GPGPU programming RapidMind Multi-core Development Platform Results
3 Medical Imaging
4 Image Registration template target transformed template Find a transformation that best maps the template image to the target image. Best match: Minimize i i differences Maximize similarities
5 Diagnosis Applications Combine different information from multiple imaging modalities. Monitor disease progression Alignment of tissues/organs that have changed in size, shape, position over time. Image guided surgery or radiotherapy Align pre-operative images and surgical plans to the images obtained from real time during surgery. Patient comparison Compare an individual s anatomy to a standardized atlas.
6 Image Registration Problem Image: mapping from image domain to intensity range template image F: F, target image G: G Find G F such hthat tcorresponding points in F and G are aligned. F G y Φ x
7 Classification Based on types of transformations. Rigid translation + rotation Nonrigid affine, projective physically based elastic fluid
8 Rigid Image Registration Only rotations and translations are considered.
9 Rigid Image Registration Used for registration where there is no or very little distortion. Often served as pre-alignment step for nonrigid registration. 2D-2D: composed of 2 translations ti and 1 rotation. ti 3D-3D: composed of 3 translations and 3 rotations. 2D-3D image registration important in medical applications such as radiation therapies, computer-assisted surgeries. align a 2D image with a 3D image volume.
10 2D-3D Image Registration real world virtual world position of patient can be estimated when DRR matches portal X-ray image
11 Rigid Image Registration Model 2 min F ( s ) G F ( s) i, j Gi, j S ) 2 i, j s = transformation parameters ( x, y, z, x, y, z ) F (s) = digitally reconstructed radiograph (DRR) G = portal image Match the DRR and portal image by appropriately p rotating and translating the 3D volume. Two major steps Construction of DRR Solving the minimization problem 2
12 Construction of DRR DRR is constructed by perspective projection of 3D image volume onto a given plane.
13 Volume Rendering of 3D Image Ray casting: Light rays pass from the source through the pixels of DRR into 3D image volume. Pixel values = accumulated intensities of 3D image. Intensities at x are given by trilinear interpolation.
14 Solving the Registration Min. Problem Solving the Registration Min. Problem Gauss-Newton method. Let s n = previous approx. Let s n+1 = s n + s. The nonlinear least squares problem: The nonlinear least squares problem: j i j i j i n s G s F 2,, 1 ) ( min Taylor expansion: j i, Th li LS bl n T j i n j i n j i n s s F s F s F,,, 1 ) ( ) ( ) ( The approx. linear LS squares problem: j i n T j i n j i n G s s F s F 2 ) ( ) ( min j i j i j i j i s,,,, ) ( ) (
15 Gauss-Newton Method The linear LS problem can be written as: min A = F (s n ) = N 2 6 matrix b = G - F = N 2 1 vector x = s = N 2 1 vector x Ax b The linear LS problem is solved by the normal equations: 2 2 A T Ax A T b The procedure is repeated until s n converges. Computationally expensive to form A, A T A, and A T b.
16 Parallel Computation of A and A T A The entries of A are image gradients at pixel (i, j). The gradients are approximated by finite difference. E.g. n n n F ( s ) i, j F ( s s ) i, j F ( s ) x i, x x j, s x (,0,0,0,0,0) x F(s n +s x ) obtained from DRR by changing the rotation angle x x + x, keeping other parameters fixed. DRRs as well as subtractions and divisions are computed in //. Six cols of A are: F/ x x,, F/ y y,, F/ z z,, F/ x x,, F/ y y,, F/ z. To compute A T A, take 2 derivative arrays, multiply the corresponding elements in //, and then use a reduction operation to compute the global sum.
17 2D-3D Image Registration on GPU s s Portal image s Matrix & vector computation Solve A T A x = A T b A T A, A T b Form A, A T A, A T b in parallel on GPU
18 Parallel Systems PC clusters powerful not portable hard to maintain 1.1PF/s 1.06PF/s
19 Multi-Core Systems Dual & quad cores on PCs Cell Broadband Engine Graphics Processing Units (GPUs)
20 Multi-Core Processors Specifications Core i7 960 GTX285 Processing Elements Resident Strands/Threads (max) 4 cores, 4 way SIMD 30 cores, 8 way GHz 4 cores, 2 threads, 4 way SIMD: 32 GHz 30 cores, 32 SIMD vectors, 32 way SIMD: threads Core i7 (45nm) SP GFLOP/s Memory Bandwidth 25.6 GB/s 159 GB/s Register File MB Local Store 480 kb GTX285 (55nm)
21 Cell Broadband Engine Processors
22 GPU for Computing GPUs have evolved into very flexible and powerful procs: It s programmable using high-level languages It supports 32-bit floating point precision It offers lots of GFLOPS: LOPS GFL G80 = GeForce 8800 GTX G71 = GeForce 7900 GTX G70 = GeForce 7800 GTX NV40 = GeForce 6800 Ultra NV35 = GeForce FX 5950 Ultra NV30 = GeForce FX 5800 GPU in every PC and workstation.
23 GPU: Graphics Processing Unit GPU is specialized for compute-intensive, highly data parallel computation (exactly what graphics rendering is about). Control Cache ALU ALU CPU ALU ALU GPU DRAM DRAM Low-latency floating point (FP) computation. Applications Game effects, physics, image processing Physical modeling, computational engineering, matrix algebra, convolution, correlation, sorting.
24 GPU Programming g Model GPU w/ local l DRAM (device) CPU (host)
25 GPU Programming Language: CUDA Integrated t dhost+device C program Serial Code (host) Parallel Kernel (device) KernelA<<< nblk, ntid >>>(args);... Serial Code (host) Parallel Kernel (device) KernelB<<< nblk, ntid >>>(args);...
26 Grids, Blocks, Threads, and Memory Host Device (Device) Grid Grid 1 Block (0, 0) Block (1, 0) Kernel 1 Block (0, 0) Block (1, 0) Block (2, 0) Shared Memory Shared Memory Block (0, 1) Block (1, 1) Block (2, 1) Registers Registers Registers Registers Grid 2 Thread (0, 0) Thread (1, 0) Thread (0, 0) Thread (1, 0) Kernel 2 Local Local Local Local Memory Memory Memory Memory Block (1, 1) Thread Thread Thread Thread Thread (0, 0) (1, 0) (2, 0) (3, 0) (4, 0) Thread (0, 1) Thread (0, 2) Thread (1, 1) Thread (1, 2) Thread (2, 1) Thread (2, 2) Thread (3, 1) Thread (3, 2) Thread (4, 1) Thread (4, 2) Host Global Memory Constant Memory Texture Memory
27 RapidMind Overview RapidMind provides: 1.A flexible platform that allows an arbitrary algorithm to be expressed and efficiently mapped to both multi-core CPUs and GPUs 2.Accelerated volume processing components that provide core building blocks for medical imaging g applications Copyright 2009 RapidMind Inc.
28 Trends: Massive Portable parallelism, Architectures heterogeneity, and hybrid computing RapidMind provides portability, scalability and future-proofing Copyright 2009 RapidMind Inc.
29 RapidMind System API Architecture Intuitive, integrates with C++, and requires no new tools or workflow Platform Code Optimizer analyzes and optimizes computations to remove overhead Load Balancer plans and synchronizes work to keep all cores fully utilized Data Manager reduces data bottlenecks Logging/Diagnostics detects and reports performance bottlenecks Processor Support Modules x86 processors from AMD and Intel ATI/AMD and NVIDIA GPUs Cell Blade, Cell Accelerator Board, PS3 Copyright 2008, RapidMind, Inc.
30 RapidMind Programming Model Standard C++ using RapidMind interface Standard C++ Tools Interface extracts computation expressed in C++ while eliminating overhead Code generator creates native machine code Runtime tightly couples multiple optimizations and manages execution over multiple cores Platform specific code RapidMind Collection RapidMind Compilation Standard executable with embedded RapidMind operations Massively parallel computation RapidMind Execution Multicore Processor Streaming execution Copyright 2008, RapidMind, Inc.
31 RapidMind Basic Types Purpose Container for fixed-length data Container for variable-sized multidimensional data Container for computations Type Value Array Program Copyright 2008, RapidMind, Inc.
32 RapidMind Values 1 half 2 double Value< 3, float> 4 int Tuple size Element type Copyright 2008, RapidMind, Inc.
33 RapidMind Values 1h 2d Value3f 4i Tuple size Element type Copyright 2008, RapidMind, Inc.
34 RapidMind Arrays 1 Value4d Array< 2,Value3f > 3 Value2i Dimensionality Item type Copyright 2008, RapidMind, Inc.
35 RapidMind Programs Declaration Interface Program p; p = BEGIN { In<Value3f> a, b; Out<Value3f> c; Definition Value3f d = f(a, b); c = d + a * 2.0f; } END; Computation Copyright 2008, RapidMind, Inc.
36 Program Application Apply programs to arrays, get new arrays C = p(a,b); Invokes parallel execution All array elements updated simultaneously Copyright 2008, RapidMind, Inc.
37 RapidMind Platform Interface Summary Usage: Include platform header Link to runtime library Data: #include <rapidmind/platform.hpp> using namespace rapidmind; i d Value1f f = 2.0f; Array<2,Value3f> a(512,512); 512); Array<2,Value3f> b(512,512); Values g p g { Arrays Data abstraction Programs: Defined dynamically Execute on coprocessors Code abstraction Program prog = BEGIN { In<Value3f> r, s; Out<Value3f> q; q = (r + s) * f; } END; a = prog(a,b); f = 3.0f; stride(a,2,2) = prog( slice(a,0,255,0,255), slice(b,256,511,0,255)); )); Copyright 2008, RapidMind, Inc.
38 Numerical Experiments 3D image volume: Artificial data: white cube (128*128*128) Clinical i l 3D CT data (Univ. of Iowa Health Care): tripod facture of a skull (128*128*100) Artificial data Simulate template 2D image by projecting 3D volume data with known parameters. Standard PC with NVIDIA GeForce 8800 GTX. Real CT data
39 Numerical Results (Artificial data) Resolution C++ RapidMind Time per iteration (sec) Iteration Time per iteration (sec) Iteration Time per iteration (sec) Iteration Time per iteration (sec) Iteration 6 6
40 Numerical Results (Clinical data) Portal Image Parameters C++ Rotations: (2, 2, 2) Total lti Time (sec) Translations: (2mm, 2mm, 2mm) Iteration 5 5 Rotations: (4, 4, 4) Total Time (sec) Translations: (4mm, 4mm, 4mm) Iteration 8 8 Rotations: (6, 6, 6) Total Time (sec) Translations: (6mm, 6mm, 6mm) Iteration RapidMind
41 Numerical Results (Comparison) Timing i (sec)
42 Conclusion We have developed an efficient 2D-3D rigid image registration. Amenable for GPU processing. Implemented the algorithm using RapidMind to exploit the highly parallelism of GPUs. Numerical results show that the GPU code 100 times faster than CPU code. For real image datasets, it takes around 3 seconds for performing 2D-3D image registration.
43 Challenges Portability. Need standard (something like MPI) for programming on GPU, Cell, etc. Floating gpoint precision. Math libraries such as BLAS, LAPACK, FFT.
RapidMind. Accelerating Medical Imaging. May 13, 2009
RapidMind Accelerating Medical Imaging May 13, 2009 Outline Medical imaging software challenges Case studies Elastography, registration, breast cancer screening Platform System overview Detailed example:
More informationThe RapidMind Platform for Portable Programming of Multi-Core Processors and Many-Core Accelerators
The RapidMind Platform for Portable Programming of Multi-Core Processors and Many-Core Accelerators Michael McCool Chief Scientist, RapidMind SHARCnet Symposium 27 May 2008 Copyright 2008 RapidMind Inc.
More informationIntroduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono
Introduction to CUDA Algoritmi e Calcolo Parallelo References q This set of slides is mainly based on: " CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory " Slide of Applied
More informationCS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology
CS8803SC Software and Hardware Cooperative Computing GPGPU Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology Why GPU? A quiet revolution and potential build-up Calculation: 367
More informationIntroduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono
Introduction to CUDA Algoritmi e Calcolo Parallelo References This set of slides is mainly based on: CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory Slide of Applied
More informationhigh performance medical reconstruction using stream programming paradigms
high performance medical reconstruction using stream programming paradigms This Paper describes the implementation and results of CT reconstruction using Filtered Back Projection on various stream programming
More informationNVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield
NVIDIA GTX200: TeraFLOPS Visual Computing August 26, 2008 John Tynefield 2 Outline Execution Model Architecture Demo 3 Execution Model 4 Software Architecture Applications DX10 OpenGL OpenCL CUDA C Host
More informationCUDA (Compute Unified Device Architecture)
CUDA (Compute Unified Device Architecture) Mike Bailey History of GPU Performance vs. CPU Performance GFLOPS Source: NVIDIA G80 = GeForce 8800 GTX G71 = GeForce 7900 GTX G70 = GeForce 7800 GTX NV40 = GeForce
More informationimplementation using GPU architecture is implemented only from the viewpoint of frame level parallel encoding [6]. However, it is obvious that the mot
Parallel Implementation Algorithm of Motion Estimation for GPU Applications by Tian Song 1,2*, Masashi Koshino 2, Yuya Matsunohana 2 and Takashi Shimamoto 1,2 Abstract The video coding standard H.264/AVC
More informationComparison of High-Speed Ray Casting on GPU
Comparison of High-Speed Ray Casting on GPU using CUDA and OpenGL November 8, 2008 NVIDIA 1,2, Andreas Weinlich 1, Holger Scherl 2, Markus Kowarschik 2 and Joachim Hornegger 1 1 Chair of Pattern Recognition
More informationThreading Hardware in G80
ing Hardware in G80 1 Sources Slides by ECE 498 AL : Programming Massively Parallel Processors : Wen-Mei Hwu John Nickolls, NVIDIA 2 3D 3D API: API: OpenGL OpenGL or or Direct3D Direct3D GPU Command &
More informationLecture 1: Introduction and Computational Thinking
PASI Summer School Advanced Algorithmic Techniques for GPUs Lecture 1: Introduction and Computational Thinking 1 Course Objective To master the most commonly used algorithm techniques and computational
More informationFinite Element Integration and Assembly on Modern Multi and Many-core Processors
Finite Element Integration and Assembly on Modern Multi and Many-core Processors Krzysztof Banaś, Jan Bielański, Kazimierz Chłoń AGH University of Science and Technology, Mickiewicza 30, 30-059 Kraków,
More informationB. Tech. Project Second Stage Report on
B. Tech. Project Second Stage Report on GPU Based Active Contours Submitted by Sumit Shekhar (05007028) Under the guidance of Prof Subhasis Chaudhuri Table of Contents 1. Introduction... 1 1.1 Graphic
More informationCSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller
Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,
More informationIntroduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620
Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved
More informationComputing on GPUs. Prof. Dr. Uli Göhner. DYNAmore GmbH. Stuttgart, Germany
Computing on GPUs Prof. Dr. Uli Göhner DYNAmore GmbH Stuttgart, Germany Summary: The increasing power of GPUs has led to the intent to transfer computing load from CPUs to GPUs. A first example has been
More informationAccelerating image registration on GPUs
Accelerating image registration on GPUs Harald Köstler, Sunil Ramgopal Tatavarty SIAM Conference on Imaging Science (IS10) 13.4.2010 Contents Motivation: Image registration with FAIR GPU Programming Combining
More informationCSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University
CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand
More informationG P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G
Joined Advanced Student School (JASS) 2009 March 29 - April 7, 2009 St. Petersburg, Russia G P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G Dmitry Puzyrev St. Petersburg State University Faculty
More informationQR Decomposition on GPUs
QR Decomposition QR Algorithms Block Householder QR Andrew Kerr* 1 Dan Campbell 1 Mark Richards 2 1 Georgia Tech Research Institute 2 School of Electrical and Computer Engineering Georgia Institute of
More informationL10 Layered Depth Normal Images. Introduction Related Work Structured Point Representation Boolean Operations Conclusion
L10 Layered Depth Normal Images Introduction Related Work Structured Point Representation Boolean Operations Conclusion 1 Introduction Purpose: using the computational power on GPU to speed up solid modeling
More informationGeneral Purpose GPU Computing in Partial Wave Analysis
JLAB at 12 GeV - INT General Purpose GPU Computing in Partial Wave Analysis Hrayr Matevosyan - NTC, Indiana University November 18/2009 COmputationAL Challenges IN PWA Rapid Increase in Available Data
More informationMassively Parallel Architectures
Massively Parallel Architectures A Take on Cell Processor and GPU programming Joel Falcou - LRI joel.falcou@lri.fr Bat. 490 - Bureau 104 20 janvier 2009 Motivation The CELL processor Harder,Better,Faster,Stronger
More informationUsing GPUs to compute the multilevel summation of electrostatic forces
Using GPUs to compute the multilevel summation of electrostatic forces David J. Hardy Theoretical and Computational Biophysics Group Beckman Institute for Advanced Science and Technology University of
More informationParallel Computing: Parallel Architectures Jin, Hai
Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer
More informationREAL-TIME ADAPTIVITY IN HEAD-AND-NECK AND LUNG CANCER RADIOTHERAPY IN A GPU ENVIRONMENT
REAL-TIME ADAPTIVITY IN HEAD-AND-NECK AND LUNG CANCER RADIOTHERAPY IN A GPU ENVIRONMENT Anand P Santhanam Assistant Professor, Department of Radiation Oncology OUTLINE Adaptive radiotherapy for head and
More informationStudy and implementation of computational methods for Differential Equations in heterogeneous systems. Asimina Vouronikoy - Eleni Zisiou
Study and implementation of computational methods for Differential Equations in heterogeneous systems Asimina Vouronikoy - Eleni Zisiou Outline Introduction Review of related work Cyclic Reduction Algorithm
More informationWilliam Yang Group 14 Mentor: Dr. Rogerio Richa Visual Tracking of Surgical Tools in Retinal Surgery using Particle Filtering
Mutual Information Computation and Maximization Using GPU Yuping Lin and Gérard Medioni Computer Vision and Pattern Recognition Workshops (CVPR) Anchorage, AK, pp. 1-6, June 2008 Project Summary and Paper
More informationLecture 15: Introduction to GPU programming. Lecture 15: Introduction to GPU programming p. 1
Lecture 15: Introduction to GPU programming Lecture 15: Introduction to GPU programming p. 1 Overview Hardware features of GPGPU Principles of GPU programming A good reference: David B. Kirk and Wen-mei
More informationTechnology for a better society. hetcomp.com
Technology for a better society hetcomp.com 1 J. Seland, C. Dyken, T. R. Hagen, A. R. Brodtkorb, J. Hjelmervik,E Bjønnes GPU Computing USIT Course Week 16th November 2011 hetcomp.com 2 9:30 10:15 Introduction
More informationHigh performance 2D Discrete Fourier Transform on Heterogeneous Platforms. Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli
High performance 2D Discrete Fourier Transform on Heterogeneous Platforms Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli Motivation Fourier Transform widely used in Physics, Astronomy, Engineering
More informationIntroduction to CUDA (1 of n*)
Agenda Introduction to CUDA (1 of n*) GPU architecture review CUDA First of two or three dedicated classes Joseph Kider University of Pennsylvania CIS 565 - Spring 2011 * Where n is 2 or 3 Acknowledgements
More informationInstitute of Cardiovascular Science, UCL Centre for Cardiovascular Imaging, London, United Kingdom, 2
Grzegorz Tomasz Kowalik 1, Jennifer Anne Steeden 1, Bejal Pandya 1, David Atkinson 2, Andrew Taylor 1, and Vivek Muthurangu 1 1 Institute of Cardiovascular Science, UCL Centre for Cardiovascular Imaging,
More informationTesla Architecture, CUDA and Optimization Strategies
Tesla Architecture, CUDA and Optimization Strategies Lan Shi, Li Yi & Liyuan Zhang Hauptseminar: Multicore Architectures and Programming Page 1 Outline Tesla Architecture & CUDA CUDA Programming Optimization
More informationVery fast simulation of nonlinear water waves in very large numerical wave tanks on affordable graphics cards
Very fast simulation of nonlinear water waves in very large numerical wave tanks on affordable graphics cards By Allan P. Engsig-Karup, Morten Gorm Madsen and Stefan L. Glimberg DTU Informatics Workshop
More informationThis is a draft chapter from an upcoming CUDA textbook by David Kirk from NVIDIA and Prof. Wen-mei Hwu from UIUC.
David Kirk/NVIDIA and Wen-mei Hwu, 2006-2008 This is a draft chapter from an upcoming CUDA textbook by David Kirk from NVIDIA and Prof. Wen-mei Hwu from UIUC. Please send any comment to dkirk@nvidia.com
More informationHigh Performance Compute Platform Based on multi-core DSP for Seismic Modeling and Imaging
High Performance Compute Platform Based on multi-core DSP for Seismic Modeling and Imaging Presenter: Murtaza Ali, Texas Instruments Contributors: Murtaza Ali, Eric Stotzer, Xiaohui Li, Texas Instruments
More informationPortland State University ECE 588/688. Graphics Processors
Portland State University ECE 588/688 Graphics Processors Copyright by Alaa Alameldeen 2018 Why Graphics Processors? Graphics programs have different characteristics from general purpose programs Highly
More informationAccelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors
Accelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors Michael Boyer, David Tarjan, Scott T. Acton, and Kevin Skadron University of Virginia IPDPS 2009 Outline Leukocyte
More informationParallel Computing. Hwansoo Han (SKKU)
Parallel Computing Hwansoo Han (SKKU) Unicore Limitations Performance scaling stopped due to Power consumption Wire delay DRAM latency Limitation in ILP 10000 SPEC CINT2000 2 cores/chip Xeon 3.0GHz Core2duo
More informationOn the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters
1 On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters N. P. Karunadasa & D. N. Ranasinghe University of Colombo School of Computing, Sri Lanka nishantha@opensource.lk, dnr@ucsc.cmb.ac.lk
More informationGPU Programming Using NVIDIA CUDA
GPU Programming Using NVIDIA CUDA Siddhante Nangla 1, Professor Chetna Achar 2 1, 2 MET s Institute of Computer Science, Bandra Mumbai University Abstract: GPGPU or General-Purpose Computing on Graphics
More informationGPU Computation Strategies & Tricks. Ian Buck NVIDIA
GPU Computation Strategies & Tricks Ian Buck NVIDIA Recent Trends 2 Compute is Cheap parallelism to keep 100s of ALUs per chip busy shading is highly parallel millions of fragments per frame 0.5mm 64-bit
More informationParticle-in-Cell Simulations on Modern Computing Platforms. Viktor K. Decyk and Tajendra V. Singh UCLA
Particle-in-Cell Simulations on Modern Computing Platforms Viktor K. Decyk and Tajendra V. Singh UCLA Outline of Presentation Abstraction of future computer hardware PIC on GPUs OpenCL and Cuda Fortran
More informationGPGPU. Peter Laurens 1st-year PhD Student, NSC
GPGPU Peter Laurens 1st-year PhD Student, NSC Presentation Overview 1. What is it? 2. What can it do for me? 3. How can I get it to do that? 4. What s the catch? 5. What s the future? What is it? Introducing
More informationGeorgia Institute of Technology Center for Signal and Image Processing Steve Conover February 2009
Georgia Institute of Technology Center for Signal and Image Processing Steve Conover February 2009 Introduction CUDA is a tool to turn your graphics card into a small computing cluster. It s not always
More informationCUDA GPGPU Workshop 2012
CUDA GPGPU Workshop 2012 Parallel Programming: C thread, Open MP, and Open MPI Presenter: Nasrin Sultana Wichita State University 07/10/2012 Parallel Programming: Open MP, MPI, Open MPI & CUDA Outline
More informationOn Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators
On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators Karl Rupp, Barry Smith rupp@mcs.anl.gov Mathematics and Computer Science Division Argonne National Laboratory FEMTEC
More informationGPU Programming. Lecture 1: Introduction. Miaoqing Huang University of Arkansas 1 / 27
1 / 27 GPU Programming Lecture 1: Introduction Miaoqing Huang University of Arkansas 2 / 27 Outline Course Introduction GPUs as Parallel Computers Trend and Design Philosophies Programming and Execution
More informationOptimisation Myths and Facts as Seen in Statistical Physics
Optimisation Myths and Facts as Seen in Statistical Physics Massimo Bernaschi Institute for Applied Computing National Research Council & Computer Science Department University La Sapienza Rome - ITALY
More informationData Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions
Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions Ziming Zhong Vladimir Rychkov Alexey Lastovetsky Heterogeneous Computing
More informationGraphics Processor Acceleration and YOU
Graphics Processor Acceleration and YOU James Phillips Research/gpu/ Goals of Lecture After this talk the audience will: Understand how GPUs differ from CPUs Understand the limits of GPU acceleration Have
More informationREDUCING BEAMFORMING CALCULATION TIME WITH GPU ACCELERATED ALGORITHMS
BeBeC-2014-08 REDUCING BEAMFORMING CALCULATION TIME WITH GPU ACCELERATED ALGORITHMS Steffen Schmidt GFaI ev Volmerstraße 3, 12489, Berlin, Germany ABSTRACT Beamforming algorithms make high demands on the
More informationCUDA. Matthew Joyner, Jeremy Williams
CUDA Matthew Joyner, Jeremy Williams Agenda What is CUDA? CUDA GPU Architecture CPU/GPU Communication Coding in CUDA Use cases of CUDA Comparison to OpenCL What is CUDA? What is CUDA? CUDA is a parallel
More informationPerformance Analysis of Memory Transfers and GEMM Subroutines on NVIDIA TESLA GPU Cluster
Performance Analysis of Memory Transfers and GEMM Subroutines on NVIDIA TESLA GPU Cluster Veerendra Allada, Troy Benjegerdes Electrical and Computer Engineering, Ames Laboratory Iowa State University &
More informationDuksu Kim. Professional Experience Senior researcher, KISTI High performance visualization
Duksu Kim Assistant professor, KORATEHC Education Ph.D. Computer Science, KAIST Parallel Proximity Computation on Heterogeneous Computing Systems for Graphics Applications Professional Experience Senior
More informationLecture 5. Performance programming for stencil methods Vectorization Computing with GPUs
Lecture 5 Performance programming for stencil methods Vectorization Computing with GPUs Announcements Forge accounts: set up ssh public key, tcsh Turnin was enabled for Programming Lab #1: due at 9pm today,
More informationCSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.
CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance
More informationMaster Informatics Eng.
Advanced Architectures Master Informatics Eng. 2018/19 A.J.Proença Data Parallelism 3 (GPU/CUDA, Neural Nets,...) (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 2018/19 1 The
More informationHiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes.
HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes Ian Glendinning Outline NVIDIA GPU cards CUDA & OpenCL Parallel Implementation
More informationGPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS
GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS Agenda Forming a GPGPU WG 1 st meeting Future meetings Activities Forming a GPGPU WG To raise needs and enhance information sharing A platform for knowledge
More informationGeneral Purpose GPU programming (GP-GPU) with Nvidia CUDA. Libby Shoop
General Purpose GPU programming (GP-GPU) with Nvidia CUDA Libby Shoop 3 What is (Historical) GPGPU? General Purpose computation using GPU and graphics API in applications other than 3D graphics GPU accelerates
More informationFrom Brook to CUDA. GPU Technology Conference
From Brook to CUDA GPU Technology Conference A 50 Second Tutorial on GPU Programming by Ian Buck Adding two vectors in C is pretty easy for (i=0; i
More informationGPU applications in Cancer Radiation Therapy at UCSD. Steve Jiang, UCSD Radiation Oncology Amit Majumdar, SDSC Dongju (DJ) Choi, SDSC
GPU applications in Cancer Radiation Therapy at UCSD Steve Jiang, UCSD Radiation Oncology Amit Majumdar, SDSC Dongju (DJ) Choi, SDSC Conventional Radiotherapy SIMULATION: Construciton, Dij Days PLANNING:
More informationCUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging
CUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging Saoni Mukherjee, Nicholas Moore, James Brock and Miriam Leeser September 12, 2012 1 Outline Introduction to CT Scan, 3D reconstruction
More informationCSE 160 Lecture 24. Graphical Processing Units
CSE 160 Lecture 24 Graphical Processing Units Announcements Next week we meet in 1202 on Monday 3/11 only On Weds 3/13 we have a 2 hour session Usual class time at the Rady school final exam review SDSC
More informationSolving Dense Linear Systems on Graphics Processors
Solving Dense Linear Systems on Graphics Processors Sergio Barrachina Maribel Castillo Francisco Igual Rafael Mayo Enrique S. Quintana-Ortí High Performance Computing & Architectures Group Universidad
More informationGPU Programming. Lecture 2: CUDA C Basics. Miaoqing Huang University of Arkansas 1 / 34
1 / 34 GPU Programming Lecture 2: CUDA C Basics Miaoqing Huang University of Arkansas 2 / 34 Outline Evolvements of NVIDIA GPU CUDA Basic Detailed Steps Device Memories and Data Transfer Kernel Functions
More informationDIFFERENTIAL. Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka
USE OF FOR Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka Faculty of Nuclear Sciences and Physical Engineering Czech Technical University in Prague Mini workshop on advanced numerical methods
More informationHigh Performance Computing on GPUs using NVIDIA CUDA
High Performance Computing on GPUs using NVIDIA CUDA Slides include some material from GPGPU tutorial at SIGGRAPH2007: http://www.gpgpu.org/s2007 1 Outline Motivation Stream programming Simplified HW and
More informationHow to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić
How to perform HPL on CPU&GPU clusters Dr.sc. Draško Tomić email: drasko.tomic@hp.com Forecasting is not so easy, HPL benchmarking could be even more difficult Agenda TOP500 GPU trends Some basics about
More informationOptimization of Cone Beam CT Reconstruction Algorithm Based on CUDA
Sensors & Transducers 2013 by IFSA http://www.sensorsportal.com Optimization of Cone Beam CT Reconstruction Algorithm Based on CUDA 1 Wang LI-Fang, 2 Zhang Shu-Hai 1 School of Electronics and Computer
More informationPerformance potential for simulating spin models on GPU
Performance potential for simulating spin models on GPU Martin Weigel Institut für Physik, Johannes-Gutenberg-Universität Mainz, Germany 11th International NTZ-Workshop on New Developments in Computational
More informationGraphics Processing Unit (GPU) Acceleration of Machine Vision Software for Space Flight Applications
Graphics Processing Unit (GPU) Acceleration of Machine Vision Software for Space Flight Applications Workshop on Space Flight Software November 6, 2009 Brent Tweddle Massachusetts Institute of Technology
More informationIntroduction to CELL B.E. and GPU Programming. Agenda
Introduction to CELL B.E. and GPU Programming Department of Electrical & Computer Engineering Rutgers University Agenda Background CELL B.E. Architecture Overview CELL B.E. Programming Environment GPU
More informationMultigrid algorithms on multi-gpu architectures
Multigrid algorithms on multi-gpu architectures H. Köstler European Multi-Grid Conference EMG 2010 Isola d Ischia, Italy 20.9.2010 2 Contents Work @ LSS GPU Architectures and Programming Paradigms Applications
More information3D Registration based on Normalized Mutual Information
3D Registration based on Normalized Mutual Information Performance of CPU vs. GPU Implementation Florian Jung, Stefan Wesarg Interactive Graphics Systems Group (GRIS), TU Darmstadt, Germany stefan.wesarg@gris.tu-darmstadt.de
More informationAbstract. Introduction. Kevin Todisco
- Kevin Todisco Figure 1: A large scale example of the simulation. The leftmost image shows the beginning of the test case, and shows how the fluid refracts the environment around it. The middle image
More informationAn Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection
An Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection Hiroyuki Usui, Jun Tanabe, Toru Sano, Hui Xu, and Takashi Miyamori Toshiba Corporation, Kawasaki, Japan Copyright 2013,
More informationLarge scale Imaging on Current Many- Core Platforms
Large scale Imaging on Current Many- Core Platforms SIAM Conf. on Imaging Science 2012 May 20, 2012 Dr. Harald Köstler Chair for System Simulation Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen,
More informationChapter 6. Parallel Processors from Client to Cloud. Copyright 2014 Elsevier Inc. All rights reserved.
Chapter 6 Parallel Processors from Client to Cloud FIGURE 6.1 Hardware/software categorization and examples of application perspective on concurrency versus hardware perspective on parallelism. 2 FIGURE
More informationAbout Phoenix FD PLUGIN FOR 3DS MAX AND MAYA. SIMULATING AND RENDERING BOTH LIQUIDS AND FIRE/SMOKE. USED IN MOVIES, GAMES AND COMMERCIALS.
About Phoenix FD PLUGIN FOR 3DS MAX AND MAYA. SIMULATING AND RENDERING BOTH LIQUIDS AND FIRE/SMOKE. USED IN MOVIES, GAMES AND COMMERCIALS. Phoenix FD core SIMULATION & RENDERING. SIMULATION CORE - GRID-BASED
More informationTesla GPU Computing A Revolution in High Performance Computing
Tesla GPU Computing A Revolution in High Performance Computing Mark Harris, NVIDIA Agenda Tesla GPU Computing CUDA Fermi What is GPU Computing? Introduction to Tesla CUDA Architecture Programming & Memory
More informationCS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST
CS 380 - GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8 Markus Hadwiger, KAUST Reading Assignment #5 (until March 12) Read (required): Programming Massively Parallel Processors book, Chapter
More informationExploiting GPU Caches in Sparse Matrix Vector Multiplication. Yusuke Nagasaka Tokyo Institute of Technology
Exploiting GPU Caches in Sparse Matrix Vector Multiplication Yusuke Nagasaka Tokyo Institute of Technology Sparse Matrix Generated by FEM, being as the graph data Often require solving sparse linear equation
More informationDense matching GPU implementation
Dense matching GPU implementation Author: Hailong Fu. Supervisor: Prof. Dr.-Ing. Norbert Haala, Dipl. -Ing. Mathias Rothermel. Universität Stuttgart 1. Introduction Correspondence problem is an important
More informationHigh Performance Computing and GPU Programming
High Performance Computing and GPU Programming Lecture 1: Introduction Objectives C++/CPU Review GPU Intro Programming Model Objectives Objectives Before we begin a little motivation Intel Xeon 2.67GHz
More informationCurrent Trends in Computer Graphics Hardware
Current Trends in Computer Graphics Hardware Dirk Reiners University of Louisiana Lafayette, LA Quick Introduction Assistant Professor in Computer Science at University of Louisiana, Lafayette (since 2006)
More informationScientific Computations Using Graphics Processors
Scientific Computations Using Graphics Processors Blair Perot Ali Khajeh-Saeed Tim McGuiness History Kevin Bowers, X Division Los Alamos Lab (2003) Lots of Memory Uses Memory Banks Cheap (commodity) Relativistic
More informationCS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS
CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS 1 Last time Each block is assigned to and executed on a single streaming multiprocessor (SM). Threads execute in groups of 32 called warps. Threads in
More informationSplotch: High Performance Visualization using MPI, OpenMP and CUDA
Splotch: High Performance Visualization using MPI, OpenMP and CUDA Klaus Dolag (Munich University Observatory) Martin Reinecke (MPA, Garching) Claudio Gheller (CSCS, Switzerland), Marzia Rivi (CINECA,
More informationNumerical Algorithms on Multi-GPU Architectures
Numerical Algorithms on Multi-GPU Architectures Dr.-Ing. Harald Köstler 2 nd International Workshops on Advances in Computational Mechanics Yokohama, Japan 30.3.2010 2 3 Contents Motivation: Applications
More informationAccelerating CFD with Graphics Hardware
Accelerating CFD with Graphics Hardware Graham Pullan (Whittle Laboratory, Cambridge University) 16 March 2009 Today Motivation CPUs and GPUs Programming NVIDIA GPUs with CUDA Application to turbomachinery
More informationGPU Accelerating Speeded-Up Robust Features Timothy B. Terriberry, Lindley M. French, and John Helmsen
GPU Accelerating Speeded-Up Robust Features Timothy B. Terriberry, Lindley M. French, and John Helmsen Overview of ArgonST Manufacturer of integrated sensor hardware and sensor analysis systems 2 RF, COMINT,
More informationSpring 2009 Prof. Hyesoon Kim
Spring 2009 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on
More informationHigh-Performance Computing Using GPUs
High-Performance Computing Using GPUs Luca Caucci caucci@email.arizona.edu Center for Gamma-Ray Imaging November 7, 2012 Outline Slide 1 of 27 Why GPUs? What is CUDA? The CUDA programming model Anatomy
More informationGeneral Purpose Computing on Graphical Processing Units (GPGPU(
General Purpose Computing on Graphical Processing Units (GPGPU( / GPGP /GP 2 ) By Simon J.K. Pedersen Aalborg University, Oct 2008 VGIS, Readings Course Presentation no. 7 Presentation Outline Part 1:
More informationTurbostream: A CFD solver for manycore
Turbostream: A CFD solver for manycore processors Tobias Brandvik Whittle Laboratory University of Cambridge Aim To produce an order of magnitude reduction in the run-time of CFD solvers for the same hardware
More informationGPU & High Performance Computing (by NVIDIA) CUDA. Compute Unified Device Architecture Florian Schornbaum
GPU & High Performance Computing (by NVIDIA) CUDA Compute Unified Device Architecture 29.02.2008 Florian Schornbaum GPU Computing Performance In the last few years the GPU has evolved into an absolute
More information