Fast Compressive Sensing MRI Reconstruction using multi-gpu system
|
|
- Osborn Casey
- 6 years ago
- Views:
Transcription
1 Fast Compressive Sensing MRI Reconstruction using multi-gpu system TRAN MINH QUAN WON-KI JEONG High-performance Visual Computing Laboratory, School of Electrical and Computer Engineering, Ulsan National Institute of Science and Technology (UNIST). UNIST-gil 50 (100 Banyeon-ri), Eonyang-eup, Ulju-gun, Ulsan Metropolitan City. Republic of Korea,
2 Talk Overview Introduction 2D Dynamic Compressive Sensing MRI (CS MRI) Split Bregman (SB) Method Total Variation Fast 2D Discrete Wavelet Transform (DWT) with mixed-band Result of single GPU system Result of multi GPU system Q&A 2
3 2D Dynamic MRI (2.5D MRI) Cardiac MRI Perfusion MRI 3
4 MRI Reconstruction VERY FAST f = Ku K = RF R: sampling mask F: 2D Fourier Transform IFFT2 IFFT2 CS VERY SLOW Traditional MRI Zero Filling Reconstruction CS MRI 4
5 Motivation Why do we use sparse sampling? ~20-40 minutes down to ~1-3 minutes Greatly reduce the scanning time (~16x) Why do we use the GPUs? Speed up the reconstruction time 5
6 CSMRI Problem min u J(u) s.t i Ku i f i 2 < μ Lustig et. al. J u = F z W xy u 1 Goldstein et. al. x z (temporal) J u = xyz u 1 Our method y J u = W xy u 1 + xy u 1 + z u 1 => l 1 minimization problem 6
7 Proposed SB CSMRI Algorithm Initialize u 0 = RF 1 f and d x 0 = d y 0 = d z 0 = w 0 = 0 While u k u k 1 2 > tol u k = min u d k+1 x = max s k xy μ Ku f 2 + λ 2 2 dk u b k 2 + γ 2 wk k Wu b 2 w 1 λ x, 0 xu k +b x k sk xy Sub Optimization Problem End d k+1 y = max s k xy 1, 0 yu k +b y λ k y s xy d k+1 z = max s k z 1, 0 zu k +bk z λ z w k+1 = shrink Wu k+1 + b w k, 1 γ b x k+1 = b x k + x u k+1 d x k+1 b y k+1 = b y k + y u k+1 d y k+1 b z k+1 = b z k + z u k+1 d z k+1 b w k+1 = b w k + Wu k+1 w k+1 s z k k J u = xy u 1 + z u 1 + W xy u 1 Update Bregman Distances (Smoothing/Thresholding) Update Bregman Variables Ref: Goldstein et. al, The Split Bregman Method for L1-Regularized Problems,. SIAM
8 Building Blocks Iterative solver Gradient, Laplacian operators Using Finite Difference Method Discrete Fourier transform CUFFT Discrete Wavelet transform Fast GPU mixed-band algorithm 8
9 2D Wavelet Transform Traditional Approach * * * 9
10 2D Wavelet Transform Traditional vs. Mixed-band 10
11 2D Wavelet Transform with Mixed-band (1) Haar 2x2 M = a b c d W = G = W M W T Haar 4x4 G = 1 2 +a + b + c + d +a + b c d +a b + c d +a b c + d Haar 8x8 11
12 2D Wavelet Transform with Mixed-band (2) encode_8 Kernel decode_8 Kernel Why do we choose block Size 8x8? 12
13 Optimize 2D Haar Wavelet (1) G = 1 +a + b + c + d 2 +a + b c d Broadcasting +a b + c d +a b c + d global void encode_8(float2* src, float2* dst, int nrows, int ncols, int irows, int icols) { //Read a 8x8 block from global memory to shared memory... syncthreads(); float2 a, b, c, d; //Registers type, each thread will have its own values //First time Haar 2x2 if(((tid.y&0)==0)&&((tid.x&0)==0)) { a = smem[(tid.y & (~1)) + 0][(tid.x & (~1)) + 0]; b = smem[(tid.y & (~1)) + 0][(tid.x & (~1)) + 1]; c = smem[(tid.y & (~1)) + 1][(tid.x & (~1)) + 0]; d = smem[(tid.y & (~1)) + 1][(tid.x & (~1)) + 1]; } syncthreads(); Haar 2x2 if(((tid.y&0) == 0)&&((tid.x&0) == 0)) smem[(tid.y][tid.x] = 0.5f * (a + b + c + d); else if(((tid.y&0) == 0)&&((tid.x&0) == 1)) smem[(tid.y][tid.x] = 0.5f * (a - b + c - d); else if(((tid.y&0) == 1)&&((tid.x&0) == 0)) smem[(tid.y][tid.x] = 0.5f * (a + b - c - d); else if(((tid.y&0) == 1)&&((tid.x&0) == 1)) smem[(tid.y][tid.x] = 0.5f * (a - b - c + d); syncthreads(); Divergence } //Second time Haar 2x2... //Third time Haar 2x
14 Optimize 2D Haar Wavelet (2) device void switchsign(unsigned int intsign, float2* number) { *(number) *= intsign? (-1.0f):(1.0f); } G = 1 2 H 2 = 1 2 +a + b + c + d +a + b c d +a b + c d +a b c + d Recursive Representation +H d 1 +H d 1 H d 1 H d = 1 2 +H d 1 Synthetic Representation H d i,j = 1 (i j) 2 d 2 1 is bitwise dot product global void encode_8(float2* src, float2* dst, int nrows, int ncols, int irows, int icols) { //Read a 8x8 block from global memory to shared memory... syncthreads(); float2 a, b, c, d; //Registers type, each thread will have its own values //First time Haar 2x2 if(((tid.y&0)==0)&&((tid.x&0)==0)) { a = smem[(tid.y & (~1)) + 0][(tid.x & (~1)) + 0]; b = smem[(tid.y & (~1)) + 0][(tid.x & (~1)) + 1]; c = smem[(tid.y & (~1)) + 1][(tid.x & (~1)) + 0]; d = smem[(tid.y & (~1)) + 1][(tid.x & (~1)) + 1]; } switchsign( (((tid.y>>0 & 1) & 0) ^ ((tid.x>>0 & 1) & 0)), &a); switchsign( (((tid.y>>0 & 1) & 0) ^ ((tid.x>>0 & 1) & 1)), &b); switchsign( (((tid.y>>0 & 1) & 1) ^ ((tid.x>>0 & 1) & 0)), &c); switchsign( (((tid.y>>0 & 1) & 1) ^ ((tid.x>>0 & 1) & 1)), &d); smem[(tid.y][tid.x] = 0.5f * (a + b + c + d); } syncthreads(); //Second time Haar 2x2... //Third time Haar 2x (3 2) = 1 1,1 (1,0) = = 1 14
15 Optimize 2D Haar Wavelet (3) device void switchsign(unsigned int intsign, float2* number) { *(number) *= intsign? (-1.0f):(1.0f); } device void switchsign(unsigned int intsign, float2* number) { *((long long int*)number) ^= intsign? 0x : 0x ; } Complex number y = m + i*n float2.x float2.y 2D WAVELET TRANSFORM WITH MIXED-BAND Image size 512 x 512 Shared Memory Multiply with -1 or +1 Casting signed bit 1 lvl 3 lvls 9 lvls 1 lvl 3 lvls 9 lvls 1 lvl 3 lvls 9 lvls 2D Forward Wavelet N/a N/a D Inverse Wavelet N/a N/a Unit: Miliseconds 15
16 Comparison of Lenna (512x512): Full Decomposition 9 levels Filter Scheme (GPU): milisecond Lifting Scheme (GPU) : milisecond Mixed-band (GPU) : millisecond (20x faster) 16
17 Put everything together with 1 GPU min u J(u) s.t i Ku i f i 2 < μ J u = xy u 1 + z u 1 + W xy u 1 17
18 Results of 2D Dynamic MRI Flank Tumor Dataset (256 slices) 1/1 1/4 1/8 1/10 1/12 1/16 Image size: 128x128 18
19 Performance of the CSMRI reconstruction (in milliseconds) Operations 32 slices 128 slices 256 slices Inverse Differentiation Compute Right Hand Side Sub Optimization Prob (Modified Richardson) Forward Differentiation Shrinkage Update Bregman Parameter Update Kspace Total Image size: 128x128 19
20 MultiGPU system information ~]$ lspci -tv Advanced ~]$ lspci Micro tvdevices [AMD] nee ATI RD Northbridge Advanced Micro only dual Devices slot [AMD] (2x8) nee PCI-e ATI GFX RD890 Hydra part Northbridge only dual slot (2x8) PCI-e GFX Hydra part NVIDIA Corporation Tesla M \ NVIDIA NVIDIA Corporation Corporation Tesla Tesla M2090 M2090 \ NVIDIA Corporation Tesla M2090 \ NVIDIA Corporation Tesla M2090 \ \ NVIDIA NVIDIA Corporation Corporation Tesla Tesla M2090 M2090 \ NVIDIA Corporation Tesla M2090 \ Advanced Micro Devices [AMD] nee ATI RD890 \ Northbridge Advanced Micro only Devices dual slot [AMD] (2x8) nee PCI-e ATI RD890 GFX Hydra part Northbridge only dual slot (2x8) PCI-e GFX Hydra part NVIDIA Corporation Tesla M \ NVIDIA NVIDIA Corporation Corporation Tesla Tesla M2090 M2090 \ NVIDIA Corporation Tesla M2090 \ NVIDIA Corporation Tesla M2090 \ \ NVIDIA NVIDIA Corporation Corporation Tesla Tesla M2090 M2090 \ NVIDIA Corporation Tesla M2090 MPI 20
21 MultiGPU Implementation (1) OpenMP J u = xy u 1 + z u 1 + W xy u 1 cudamemcpypeer2peer ~6.1 GB/s. Paulius M. Implementing 3D Finite Difference code on GPUs GTC
22 MultiGPU Implementation (2) OpenMP J u = xy u 1 + z u 1 + W xy u 1 cudamemcpypeer2peer ~6.1 GB/s. Paulius M. Implementing 3D Finite Difference code on GPUs GTC
23 MultiGPU Implementation (3) OpenMP J u = xy u 1 + z u 1 + W xy u 1 cudamemcpypeer2peer ~6.1 GB/s. Paulius M. Implementing 3D Finite Difference code on GPUs GTC
24 Performance on multiple GPUs Operations Inverse Differentiation Compute Right Hand Side Sub Optimization Problem Forward Differentiation Shrinkage Update Bregman Parameter Update Kspace Data Transfer Total Operations Inverse Differentiation Compute Right Hand Side Sub Optimization Problem Forward Differentiation Shrinkage Update Bregman Parameter Update Kspace Data Transfer Total 1 GPUs 2 GPUs Sync 4 GPUs Sync GPUs Sync
25 Time (miliseconds) Scalability x 1.7x 2.9x 4.3x Number of GPUs 25
26 Conclusion Summary Split Bregman Formulation for dynamic CSMRI 2D DWT on the GPU using mixed-band algorithm Multi-GPU implementation using P2P communication Acknowledgement Thanks to HyungJoon Cho and SoHyun Han for data and discussion. Funding from NRF Grant # 2012R1A1A
27 Thank you 27
Compressive Sensing Algorithms for Fast and Accurate Imaging
Compressive Sensing Algorithms for Fast and Accurate Imaging Wotao Yin Department of Computational and Applied Mathematics, Rice University SCIMM 10 ASU, Tempe, AZ Acknowledgements: results come in part
More informationTHE discrete wavelet transform (DWT) has been actively
TO APPEAR IN IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 1 A fast discrete wavelet transform using hybrid parallelism on GPUs Tran Minh Quan, Student Member, IEEE, and Won-Ki Jeong, Member, IEEE
More informationSupplemental Material for Efficient MR Image Reconstruction for Compressed MR Imaging
Supplemental Material for Efficient MR Image Reconstruction for Compressed MR Imaging Paper ID: 1195 No Institute Given 1 More Experiment Results 1.1 Visual Comparisons We apply all methods on four 2D
More informationGeorgia Institute of Technology Center for Signal and Image Processing Steve Conover February 2009
Georgia Institute of Technology Center for Signal and Image Processing Steve Conover February 2009 Introduction CUDA is a tool to turn your graphics card into a small computing cluster. It s not always
More informationHigher Degree Total Variation for 3-D Image Recovery
Higher Degree Total Variation for 3-D Image Recovery Greg Ongie*, Yue Hu, Mathews Jacob Computational Biomedical Imaging Group (CBIG) University of Iowa ISBI 2014 Beijing, China Motivation: Compressed
More informationOptimizing Data Locality for Iterative Matrix Solvers on CUDA
Optimizing Data Locality for Iterative Matrix Solvers on CUDA Raymond Flagg, Jason Monk, Yifeng Zhu PhD., Bruce Segee PhD. Department of Electrical and Computer Engineering, University of Maine, Orono,
More informationGabor and Wavelet transforms for signals defined on graphs An invitation to signal processing on graphs
Gabor and Wavelet transforms for signals defined on graphs An invitation to signal processing on graphs Benjamin Ricaud Signal Processing Lab 2, EPFL Lausanne, Switzerland SPLab, Brno, 10/2013 B. Ricaud
More informationOn Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators
On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators Karl Rupp, Barry Smith rupp@mcs.anl.gov Mathematics and Computer Science Division Argonne National Laboratory FEMTEC
More informationCUDA Architecture & Programming Model
CUDA Architecture & Programming Model Course on Multi-core Architectures & Programming Oliver Taubmann May 9, 2012 Outline Introduction Architecture Generation Fermi A Brief Look Back At Tesla What s New
More informationMultigrid algorithms on multi-gpu architectures
Multigrid algorithms on multi-gpu architectures H. Köstler European Multi-Grid Conference EMG 2010 Isola d Ischia, Italy 20.9.2010 2 Contents Work @ LSS GPU Architectures and Programming Paradigms Applications
More informationFlux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters
Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences,
More informationIntroduction. Wavelets, Curvelets [4], Surfacelets [5].
Introduction Signal reconstruction from the smallest possible Fourier measurements has been a key motivation in the compressed sensing (CS) research [1]. Accurate reconstruction from partial Fourier data
More informationEfficient MR Image Reconstruction for Compressed MR Imaging
Efficient MR Image Reconstruction for Compressed MR Imaging Junzhou Huang, Shaoting Zhang, and Dimitris Metaxas Division of Computer and Information Sciences, Rutgers University, NJ, USA 08854 Abstract.
More informationMulti-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation
Multi-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation 1 Cheng-Han Du* I-Hsin Chung** Weichung Wang* * I n s t i t u t e o f A p p l i e d M
More information1.7.1 Laplacian Smoothing
1.7.1 Laplacian Smoothing 320491: Advanced Graphics - Chapter 1 434 Theory Minimize energy functional total curvature estimate by polynomial-fitting non-linear (very slow!) 320491: Advanced Graphics -
More informationLecture 17: Array Algorithms
Lecture 17: Array Algorithms CS178: Programming Parallel and Distributed Systems April 4, 2001 Steven P. Reiss I. Overview A. We talking about constructing parallel programs 1. Last time we discussed sorting
More informationG Practical Magnetic Resonance Imaging II Sackler Institute of Biomedical Sciences New York University School of Medicine. Compressed Sensing
G16.4428 Practical Magnetic Resonance Imaging II Sackler Institute of Biomedical Sciences New York University School of Medicine Compressed Sensing Ricardo Otazo, PhD ricardo.otazo@nyumc.org Compressed
More informationSparse sampling in MRI: From basic theory to clinical application. R. Marc Lebel, PhD Department of Electrical Engineering Department of Radiology
Sparse sampling in MRI: From basic theory to clinical application R. Marc Lebel, PhD Department of Electrical Engineering Department of Radiology Objective Provide an intuitive overview of compressed sensing
More informationA primal-dual framework for mixtures of regularizers
A primal-dual framework for mixtures of regularizers Baran Gözcü baran.goezcue@epfl.ch Laboratory for Information and Inference Systems (LIONS) École Polytechnique Fédérale de Lausanne (EPFL) Switzerland
More informationOptical flow and depth from motion for omnidirectional images using a TV-L1 variational framework on graphs
ICIP 2009 - Monday, November 9 Optical flow and depth from motion for omnidirectional images using a TV-L1 variational framework on graphs Luigi Bagnato Signal Processing Laboratory - EPFL Advisors: Prof.
More informationIterative CT Reconstruction Using Curvelet-Based Regularization
Iterative CT Reconstruction Using Curvelet-Based Regularization Haibo Wu 1,2, Andreas Maier 1, Joachim Hornegger 1,2 1 Pattern Recognition Lab (LME), Department of Computer Science, 2 Graduate School in
More informationWarp shuffles. Lecture 4: warp shuffles, and reduction / scan operations. Warp shuffles. Warp shuffles
Warp shuffles Lecture 4: warp shuffles, and reduction / scan operations Prof. Mike Giles mike.giles@maths.ox.ac.uk Oxford University Mathematical Institute Oxford e-research Centre Lecture 4 p. 1 Warp
More informationExploiting GPU Caches in Sparse Matrix Vector Multiplication. Yusuke Nagasaka Tokyo Institute of Technology
Exploiting GPU Caches in Sparse Matrix Vector Multiplication Yusuke Nagasaka Tokyo Institute of Technology Sparse Matrix Generated by FEM, being as the graph data Often require solving sparse linear equation
More informationIMAGE FUSION WITH SIMULTANEOUS CARTOON AND TEXTURE DECOMPOSITION MAHDI DODANGEH, ISABEL NARRA FIGUEIREDO AND GIL GONÇALVES
Pré-Publicações do Departamento de Matemática Universidade de Coimbra Preprint Number 15 14 IMAGE FUSION WITH SIMULTANEOUS CARTOON AND TEXTURE DECOMPOSITION MAHDI DODANGEH, ISABEL NARRA FIGUEIREDO AND
More informationLecture 4: warp shuffles, and reduction / scan operations
Lecture 4: warp shuffles, and reduction / scan operations Prof. Mike Giles mike.giles@maths.ox.ac.uk Oxford University Mathematical Institute Oxford e-research Centre Lecture 4 p. 1 Warp shuffles Warp
More information4/13/ Introduction. 1. Introduction. 2. Formulation. 2. Formulation. 2. Formulation
1. Introduction Motivation: Beijing Jiaotong University 1 Lotus Hill Research Institute University of California, Los Angeles 3 CO 3 for Ultra-fast and Accurate Interactive Image Segmentation This paper
More informationCompressed Sensing for Rapid MR Imaging
Compressed Sensing for Rapid Imaging Michael Lustig1, Juan Santos1, David Donoho2 and John Pauly1 1 Electrical Engineering Department, Stanford University 2 Statistics Department, Stanford University rapid
More informationEfficient Imaging Algorithms on Many-Core Platforms
Efficient Imaging Algorithms on Many-Core Platforms H. Köstler Dagstuhl, 22.11.2011 Contents Imaging Applications HDR Compression performance of PDE-based models Image Denoising performance of patch-based
More informationFinal Review. Image Processing CSE 166 Lecture 18
Final Review Image Processing CSE 166 Lecture 18 Topics covered Basis vectors Matrix based transforms Wavelet transform Image compression Image watermarking Morphological image processing Segmentation
More informationOverview. Videos are everywhere. But can take up large amounts of resources. Exploit redundancy to reduce file size
Overview Videos are everywhere But can take up large amounts of resources Disk space Memory Network bandwidth Exploit redundancy to reduce file size Spatial Temporal General lossless compression Huffman
More informationParallel and Distributed Sparse Optimization Algorithms
Parallel and Distributed Sparse Optimization Algorithms Part I Ruoyu Li 1 1 Department of Computer Science and Engineering University of Texas at Arlington March 19, 2015 Ruoyu Li (UTA) Parallel and Distributed
More informationBlind Compressed Sensing Using Sparsifying Transforms
Blind Compressed Sensing Using Sparsifying Transforms Saiprasad Ravishankar and Yoram Bresler Department of Electrical and Computer Engineering and Coordinated Science Laboratory University of Illinois
More informationThe Benefit of Tree Sparsity in Accelerated MRI
The Benefit of Tree Sparsity in Accelerated MRI Chen Chen and Junzhou Huang Department of Computer Science and Engineering, The University of Texas at Arlington, TX, USA 76019 Abstract. The wavelet coefficients
More informationFlux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters
Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences,
More informationAccelerating Double Precision FEM Simulations with GPUs
Accelerating Double Precision FEM Simulations with GPUs Dominik Göddeke 1 3 Robert Strzodka 2 Stefan Turek 1 dominik.goeddeke@math.uni-dortmund.de 1 Mathematics III: Applied Mathematics and Numerics, University
More informationSpread Spectrum Using Chirp Modulated RF Pulses for Incoherent Sampling Compressive Sensing MRI
Spread Spectrum Using Chirp Modulated RF Pulses for Incoherent Sampling Compressive Sensing MRI Sulaiman A. AL Hasani Department of ECSE, Monash University, Melbourne, Australia Email: sulaiman.alhasani@monash.edu
More informationsmooth coefficients H. Köstler, U. Rüde
A robust multigrid solver for the optical flow problem with non- smooth coefficients H. Köstler, U. Rüde Overview Optical Flow Problem Data term and various regularizers A Robust Multigrid Solver Galerkin
More informationAccelerated MRI Techniques: Basics of Parallel Imaging and Compressed Sensing
Accelerated MRI Techniques: Basics of Parallel Imaging and Compressed Sensing Peng Hu, Ph.D. Associate Professor Department of Radiological Sciences PengHu@mednet.ucla.edu 310-267-6838 MRI... MRI has low
More informationThe Alternating Direction Method of Multipliers
The Alternating Direction Method of Multipliers With Adaptive Step Size Selection Peter Sutor, Jr. Project Advisor: Professor Tom Goldstein October 8, 2015 1 / 30 Introduction Presentation Outline 1 Convex
More informationGREAT PERFORMANCE FOR TINY PROBLEMS: BATCHED PRODUCTS OF SMALL MATRICES. Nikolay Markovskiy Peter Messmer
GREAT PERFORMANCE FOR TINY PROBLEMS: BATCHED PRODUCTS OF SMALL MATRICES Nikolay Markovskiy Peter Messmer ABOUT CP2K Atomistic and molecular simulations of solid state From ab initio DFT and Hartree-Fock
More informationOperator Upscaling and Adjoint State Method
Operator Upscaling and Adjoint State Method Tetyana Vdovina, William Symes The Rice Inversion Project Rice University vdovina@rice.edu February 0, 009 Motivation Ultimate Goal: Use 3D elastic upscaling
More informationHYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER PROF. BRYANT PROF. KAYVON 15618: PARALLEL COMPUTER ARCHITECTURE
HYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER AVISHA DHISLE PRERIT RODNEY ADHISLE PRODNEY 15618: PARALLEL COMPUTER ARCHITECTURE PROF. BRYANT PROF. KAYVON LET S
More informationInstitute of Cardiovascular Science, UCL Centre for Cardiovascular Imaging, London, United Kingdom, 2
Grzegorz Tomasz Kowalik 1, Jennifer Anne Steeden 1, Bejal Pandya 1, David Atkinson 2, Andrew Taylor 1, and Vivek Muthurangu 1 1 Institute of Cardiovascular Science, UCL Centre for Cardiovascular Imaging,
More informationImage Compression Algorithm for Different Wavelet Codes
Image Compression Algorithm for Different Wavelet Codes Tanveer Sultana Department of Information Technology Deccan college of Engineering and Technology, Hyderabad, Telangana, India. Abstract: - This
More informationCompressed Sensing for Electron Tomography
University of Maryland, College Park Department of Mathematics February 10, 2015 1/33 Outline I Introduction 1 Introduction 2 3 4 2/33 1 Introduction 2 3 4 3/33 Tomography Introduction Tomography - Producing
More informationAsynchronous OpenCL/MPI numerical simulations of conservation laws
Asynchronous OpenCL/MPI numerical simulations of conservation laws Philippe HELLUY 1,3, Thomas STRUB 2. 1 IRMA, Université de Strasbourg, 2 AxesSim, 3 Inria Tonus, France IWOCL 2015, Stanford Conservation
More informationTesla Architecture, CUDA and Optimization Strategies
Tesla Architecture, CUDA and Optimization Strategies Lan Shi, Li Yi & Liyuan Zhang Hauptseminar: Multicore Architectures and Programming Page 1 Outline Tesla Architecture & CUDA CUDA Programming Optimization
More informationA fast algorithm for sparse reconstruction based on shrinkage, subspace optimization and continuation [Wen,Yin,Goldfarb,Zhang 2009]
A fast algorithm for sparse reconstruction based on shrinkage, subspace optimization and continuation [Wen,Yin,Goldfarb,Zhang 2009] Yongjia Song University of Wisconsin-Madison April 22, 2010 Yongjia Song
More informationPorting Performance across GPUs and FPGAs
Porting Performance across GPUs and FPGAs Deming Chen, ECE, University of Illinois In collaboration with Alex Papakonstantinou 1, Karthik Gururaj 2, John Stratton 1, Jason Cong 2, Wen-Mei Hwu 1 1: ECE
More informationToday. Motivation. Motivation. Image gradient. Image gradient. Computational Photography
Computational Photography Matthias Zwicker University of Bern Fall 009 Today Gradient domain image manipulation Introduction Gradient cut & paste Tone mapping Color-to-gray conversion Motivation Cut &
More informationHigh performance 2D Discrete Fourier Transform on Heterogeneous Platforms. Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli
High performance 2D Discrete Fourier Transform on Heterogeneous Platforms Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli Motivation Fourier Transform widely used in Physics, Astronomy, Engineering
More informationReconstruction Improvements on Compressive Sensing
SCITECH Volume 6, Issue 2 RESEARCH ORGANISATION November 21, 2017 Journal of Information Sciences and Computing Technologies www.scitecresearch.com/journals Reconstruction Improvements on Compressive Sensing
More informationLearning with infinitely many features
Learning with infinitely many features R. Flamary, Joint work with A. Rakotomamonjy F. Yger, M. Volpi, M. Dalla Mura, D. Tuia Laboratoire Lagrange, Université de Nice Sophia Antipolis December 2012 Example
More informationAutomatic FFT Kernel Generation for CUDA GPUs. Akira Nukada Tokyo Institute of Technology
Automatic FFT Kernel Generation for CUDA GPUs. Akira Nukada Tokyo Institute of Technology FFT (Fast Fourier Transform) FFT is a fast algorithm to compute DFT (Discrete Fourier Transform). twiddle factors
More informationDevelopment of fast imaging techniques in MRI From the principle to the recent development
980-8575 2-1 2012 10 13 Development of fast imaging techniques in MRI From the principle to the recent development Yoshio MACHIDA and Issei MORI Health Sciences, Tohoku University Graduate School of Medicine
More informationParallelization of Shortest Path Graph Kernels on Multi-Core CPUs and GPU
Parallelization of Shortest Path Graph Kernels on Multi-Core CPUs and GPU Lifan Xu Wei Wang Marco A. Alvarez John Cavazos Dongping Zhang Department of Computer and Information Science University of Delaware
More informationLecture 5: Error Resilience & Scalability
Lecture 5: Error Resilience & Scalability Dr Reji Mathew A/Prof. Jian Zhang NICTA & CSE UNSW COMP9519 Multimedia Systems S 010 jzhang@cse.unsw.edu.au Outline Error Resilience Scalability Including slides
More informationDenoising and Edge Detection Using Sobelmethod
International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Denoising and Edge Detection Using Sobelmethod P. Sravya 1, T. Rupa devi 2, M. Janardhana Rao 3, K. Sai Jagadeesh 4, K. Prasanna
More information3D ADI Method for Fluid Simulation on Multiple GPUs. Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA
3D ADI Method for Fluid Simulation on Multiple GPUs Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA Introduction Fluid simulation using direct numerical methods Gives the most accurate result Requires
More informationSteen Moeller Center for Magnetic Resonance research University of Minnesota
Steen Moeller Center for Magnetic Resonance research University of Minnesota moeller@cmrr.umn.edu Lot of material is from a talk by Douglas C. Noll Department of Biomedical Engineering Functional MRI Laboratory
More informationLarge scale Imaging on Current Many- Core Platforms
Large scale Imaging on Current Many- Core Platforms SIAM Conf. on Imaging Science 2012 May 20, 2012 Dr. Harald Köstler Chair for System Simulation Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen,
More informationTotal Variation Regularization Method for 3D Rotational Coronary Angiography
Total Variation Regularization Method for 3D Rotational Coronary Angiography Haibo Wu 1,2, Christopher Rohkohl 1,3, Joachim Hornegger 1,2 1 Pattern Recognition Lab (LME), Department of Computer Science,
More informationHPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)
HPC and IT Issues Session Agenda Deployment of Simulation (Trends and Issues Impacting IT) Discussion Mapping HPC to Performance (Scaling, Technology Advances) Discussion Optimizing IT for Remote Access
More informationGPU-Based Acceleration for CT Image Reconstruction
GPU-Based Acceleration for CT Image Reconstruction Xiaodong Yu Advisor: Wu-chun Feng Collaborators: Guohua Cao, Hao Gong Outline Introduction and Motivation Background Knowledge Challenges and Proposed
More informationSparse signal separation and imaging in Synthetic Aperture Radar
Sparse signal separation and imaging in Synthetic Aperture Radar Mike Davies University of Edinburgh Joint work with Shaun Kelly, Bernie Mulgrew, Mehrdad Yaghoobi, and Di Wu CoSeRa, September 2016 mod
More informationAccelerating image registration on GPUs
Accelerating image registration on GPUs Harald Köstler, Sunil Ramgopal Tatavarty SIAM Conference on Imaging Science (IS10) 13.4.2010 Contents Motivation: Image registration with FAIR GPU Programming Combining
More informationRedundancy Encoding for Fast Dynamic MR Imaging using Structured Sparsity
Redundancy Encoding for Fast Dynamic MR Imaging using Structured Sparsity Vimal Singh and Ahmed H. Tewfik Electrical and Computer Engineering Dept., The University of Texas at Austin, USA Abstract. For
More informationCompressed Sensing and L 1 -Related Minimization
Compressed Sensing and L 1 -Related Minimization Yin Wotao Computational and Applied Mathematics Rice University Jan 4, 2008 Chinese Academy of Sciences Inst. Comp. Math The Problems of Interest Unconstrained
More informationParallel constraint optimization algorithms for higher-order discrete graphical models: applications in hyperspectral imaging
Parallel constraint optimization algorithms for higher-order discrete graphical models: applications in hyperspectral imaging MSc Billy Braithwaite Supervisors: Prof. Pekka Neittaanmäki and Phd Ilkka Pölönen
More informationPreparing seismic codes for GPUs and other
Preparing seismic codes for GPUs and other many-core architectures Paulius Micikevicius Developer Technology Engineer, NVIDIA 2010 SEG Post-convention Workshop (W-3) High Performance Implementations of
More informationScalable Multi Agent Simulation on the GPU. Avi Bleiweiss NVIDIA Corporation San Jose, 2009
Scalable Multi Agent Simulation on the GPU Avi Bleiweiss NVIDIA Corporation San Jose, 2009 Reasoning Explicit State machine, serial Implicit Compute intensive Fits SIMT well Collision avoidance Motivation
More informationLecture 15: More Iterative Ideas
Lecture 15: More Iterative Ideas David Bindel 15 Mar 2010 Logistics HW 2 due! Some notes on HW 2. Where we are / where we re going More iterative ideas. Intro to HW 3. More HW 2 notes See solution code!
More informationGeneral-purpose computing on graphics processing units (GPGPU)
General-purpose computing on graphics processing units (GPGPU) Thomas Ægidiussen Jensen Henrik Anker Rasmussen François Rosé November 1, 2010 Table of Contents Introduction CUDA CUDA Programming Kernels
More informationCUDA. Schedule API. Language extensions. nvcc. Function type qualifiers (1) CUDA compiler to handle the standard C extensions.
Schedule CUDA Digging further into the programming manual Application Programming Interface (API) text only part, sorry Image utilities (simple CUDA examples) Performace considerations Matrix multiplication
More informationME964 High Performance Computing for Engineering Applications
ME964 High Performance Computing for Engineering Applications Outlining Midterm Projects Topic 3: GPU-based FEA Topic 4: GPU Direct Solver for Sparse Linear Algebra March 01, 2011 Dan Negrut, 2011 ME964
More information3. Lifting Scheme of Wavelet Transform
3. Lifting Scheme of Wavelet Transform 3. Introduction The Wim Sweldens 76 developed the lifting scheme for the construction of biorthogonal wavelets. The main feature of the lifting scheme is that all
More informationCompressive Sensing Based Image Reconstruction using Wavelet Transform
Compressive Sensing Based Image Reconstruction using Wavelet Transform Sherin C Abraham #1, Ketki Pathak *2, Jigna J Patel #3 # Electronics & Communication department, Gujarat Technological University
More informationShape Optimization for Consumer-Level 3D Printing
hape Optimization for Consumer-Level 3D Printing Przemyslaw Musialski TU Wien Motivation 3D Modeling 3D Printing Przemyslaw Musialski 2 Motivation Przemyslaw Musialski 3 Przemyslaw Musialski 4 Example
More informationCMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)
CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can
More informationYunyun Yang, Chunming Li, Chiu-Yen Kao and Stanley Osher. Speaker: Chiu-Yen Kao (Math Department, The Ohio State University) BIRS, Banff, Canada
Yunyun Yang, Chunming Li, Chiu-Yen Kao and Stanley Osher Speaker: Chiu-Yen Kao (Math Department, The Ohio State University) BIRS, Banff, Canada Outline Review of Region-based Active Contour Models Mumford
More informationJ. Blair Perot. Ali Khajeh-Saeed. Software Engineer CD-adapco. Mechanical Engineering UMASS, Amherst
Ali Khajeh-Saeed Software Engineer CD-adapco J. Blair Perot Mechanical Engineering UMASS, Amherst Supercomputers Optimization Stream Benchmark Stag++ (3D Incompressible Flow Code) Matrix Multiply Function
More informationGPUs Open New Avenues in Medical MRI
GPUs Open New Avenues in Medical MRI Chris A. Cocosco D. Gallichan, F. Testud, M. Zaitsev, and J. Hennig Dept. of Radiology, Medical Physics, UNIVERSITY MEDICAL CENTER FREIBURG 1 Our research group: Biomedical
More informationParallel FFT Program Optimizations on Heterogeneous Computers
Parallel FFT Program Optimizations on Heterogeneous Computers Shuo Chen, Xiaoming Li Department of Electrical and Computer Engineering University of Delaware, Newark, DE 19716 Outline Part I: A Hybrid
More informationIntroducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method
Introducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method G. Wellein, T. Zeiser, G. Hager HPC Services Regional Computing Center A. Nitsure, K. Iglberger, U. Rüde Chair for System
More informationParallel Numerical Algorithms
Parallel Numerical Algorithms http://sudalab.is.s.u-tokyo.ac.jp/~reiji/pna14/ [ 10 ] GPU and CUDA Parallel Numerical Algorithms / IST / UTokyo 1 PNA16 Lecture Plan General Topics 1. Architecture and Performance
More informationALYA Multi-Physics System on GPUs: Offloading Large-Scale Computational Mechanics Problems
www.bsc.es ALYA Multi-Physics System on GPUs: Offloading Large-Scale Computational Mechanics Problems Vishal Mehta Engineer, Barcelona Supercomputing Center vishal.mehta@bsc.es Training BSC/UPC GPU Centre
More informationBilevel Sparse Coding
Adobe Research 345 Park Ave, San Jose, CA Mar 15, 2013 Outline 1 2 The learning model The learning algorithm 3 4 Sparse Modeling Many types of sensory data, e.g., images and audio, are in high-dimensional
More informationOutline. Single GPU Implementation. Multi-GPU Implementation. 2-pass and 1-pass approaches Performance evaluation. Scalability on clusters
Implementing 3D Finite Difference Codes on the GPU Paulius Micikevicius NVIDIA Outline Single GPU Implementation 2-pass and 1-pass approaches Performance evaluation Multi-GPU Implementation Scalability
More informationCompressive Sensing MRI with Wavelet Tree Sparsity
Compressive Sensing MRI with Wavelet Tree Sparsity Chen Chen and Junzhou Huang Department of Computer Science and Engineering University of Texas at Arlington cchen@mavs.uta.edu jzhuang@uta.edu Abstract
More informationWarps and Reduction Algorithms
Warps and Reduction Algorithms 1 more on Thread Execution block partitioning into warps single-instruction, multiple-thread, and divergence 2 Parallel Reduction Algorithms computing the sum or the maximum
More informationGTC 2017 S7672. OpenACC Best Practices: Accelerating the C++ NUMECA FINE/Open CFD Solver
David Gutzwiller, NUMECA USA (david.gutzwiller@numeca.com) Dr. Ravi Srinivasan, Dresser-Rand Alain Demeulenaere, NUMECA USA 5/9/2017 GTC 2017 S7672 OpenACC Best Practices: Accelerating the C++ NUMECA FINE/Open
More informationGradient Free Design of Microfluidic Structures on a GPU Cluster
Gradient Free Design of Microfluidic Structures on a GPU Cluster Austen Duffy - Florida State University SIAM Conference on Computational Science and Engineering March 2, 2011 Acknowledgements This work
More informationModule Memory and Data Locality
GPU Teaching Kit Accelerated Computing Module 4.5 - Memory and Data Locality Handling Arbitrary Matrix Sizes in Tiled Algorithms Objective To learn to handle arbitrary matrix sizes in tiled matrix multiplication
More informationTotal Variation Regularization Method for 3-D Rotational Coronary Angiography
Total Variation Regularization Method for 3-D Rotational Coronary Angiography Haibo Wu 1,2, Christopher Rohkohl 1,3, Joachim Hornegger 1,2 1 Pattern Recognition Lab (LME), Department of Computer Science,
More informationHigh dynamic range magnetic resonance flow imaging in the abdomen
High dynamic range magnetic resonance flow imaging in the abdomen Christopher M. Sandino EE 367 Project Proposal 1 Motivation Time-resolved, volumetric phase-contrast magnetic resonance imaging (also known
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences
More informationHighly Efficient Compensationbased Parallelism for Wavefront Loops on GPUs
Highly Efficient Compensationbased Parallelism for Wavefront Loops on GPUs Kaixi Hou, Hao Wang, Wu chun Feng {kaixihou, hwang121, wfeng}@vt.edu Jeffrey S. Vetter, Seyong Lee vetter@computer.org, lees2@ornl.gov
More informationLimited view X-ray CT for dimensional analysis
Limited view X-ray CT for dimensional analysis G. A. JONES ( GLENN.JONES@IMPERIAL.AC.UK ) P. HUTHWAITE ( P.HUTHWAITE@IMPERIAL.AC.UK ) NON-DESTRUCTIVE EVALUATION GROUP 1 Outline of talk Industrial X-ray
More informationImage Filtering, Warping and Sampling
Image Filtering, Warping and Sampling Connelly Barnes CS 4810 University of Virginia Acknowledgement: slides by Jason Lawrence, Misha Kazhdan, Allison Klein, Tom Funkhouser, Adam Finkelstein and David
More informationACCELERATING THE PRODUCTION OF SYNTHETIC SEISMOGRAMS BY A MULTICORE PROCESSOR CLUSTER WITH MULTIPLE GPUS
ACCELERATING THE PRODUCTION OF SYNTHETIC SEISMOGRAMS BY A MULTICORE PROCESSOR CLUSTER WITH MULTIPLE GPUS Ferdinando Alessi Annalisa Massini Roberto Basili INGV Introduction The simulation of wave propagation
More information