Efficient Imaging Algorithms on Many-Core Platforms

Similar documents
A parallel patch based algorithm for CT image denoising on the Cell Broadband Engine

Large scale Imaging on Current Many- Core Platforms

Multigrid algorithms on multi-gpu architectures

Geometric Multigrid on Multicore Architectures: Performance-Optimized Complex Diffusion

How to Optimize Geometric Multigrid Methods on GPUs

Software and Performance Engineering for numerical codes on GPU clusters

A Fast GPU-based Method for Image Segmentation

Histograms of Sparse Codes for Object Detection

FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG INSTITUT FÜR INFORMATIK (MATHEMATISCHE MASCHINEN UND DATENVERARBEITUNG)

Numerical Algorithms on Multi-GPU Architectures

Department of Electronics and Communication KMP College of Engineering, Perumbavoor, Kerala, India 1 2

Study and implementation of computational methods for Differential Equations in heterogeneous systems. Asimina Vouronikoy - Eleni Zisiou

Adaptive Reconstruction Methods for Low-Dose Computed Tomography

Hierarchical Hybrid Grids

Inverse Problems and Machine Learning

PhD Student. Associate Professor, Co-Director, Center for Computational Earth and Environmental Science. Abdulrahman Manea.

Automatic Generation of Algorithms and Data Structures for Geometric Multigrid. Harald Köstler, Sebastian Kuckuk Siam Parallel Processing 02/21/2014

smooth coefficients H. Köstler, U. Rüde

Accelerating Double Precision FEM Simulations with GPUs

Image Restoration and Background Separation Using Sparse Representation Framework

EFFICIENT DICTIONARY LEARNING IMPLEMENTATION ON THE GPU USING OPENCL

On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators

Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation

Accelerating image registration on GPUs

Efficient Implementation of the K-SVD Algorithm and the Batch-OMP Method

Image Restoration Using DNN

Learning Splines for Sparse Tomographic Reconstruction. Elham Sakhaee and Alireza Entezari University of Florida

Iterative CT Reconstruction Using Curvelet-Based Regularization

Large-scale Gas Turbine Simulations on GPU clusters

Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs

Sparsity and image processing

TERM PAPER ON The Compressive Sensing Based on Biorthogonal Wavelet Basis

Reconstruction of Trees from Laser Scan Data and further Simulation Topics

Image Reconstruction from Multiple Sparse Representations

Computer Vision 2. SS 18 Dr. Benjamin Guthier Professur für Bildverarbeitung. Computer Vision 2 Dr. Benjamin Guthier

A Novel Image Super-resolution Reconstruction Algorithm based on Modified Sparse Representation

Two-Phase flows on massively parallel multi-gpu clusters

Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers

ME964 High Performance Computing for Engineering Applications

Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters

Efficient Implementation of the K-SVD Algorithm using Batch Orthogonal Matching Pursuit

Re-rendering from a Dense/Sparse Set of Images

Sparse Models in Image Understanding And Computer Vision

Turbostream: A CFD solver for manycore

Sparse & Redundant Representations and Their Applications in Signal and Image Processing

Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters

Guided Image Super-Resolution: A New Technique for Photogeometric Super-Resolution in Hybrid 3-D Range Imaging

ELEG Compressive Sensing and Sparse Signal Representations

CoE4TN3 Image Processing. Wavelet and Multiresolution Processing. Image Pyramids. Image pyramids. Introduction. Multiresolution.

Image deblurring by multigrid methods. Department of Physics and Mathematics University of Insubria

Asynchronous OpenCL/MPI numerical simulations of conservation laws

Edge and local feature detection - 2. Importance of edge detection in computer vision

Inverse Problems in Astrophysics

GAMER : a GPU-accelerated Adaptive-MEsh-Refinement Code for Astrophysics GPU 與自適性網格於天文模擬之應用與效能

HYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER PROF. BRYANT PROF. KAYVON 15618: PARALLEL COMPUTER ARCHITECTURE

Today. Motivation. Motivation. Image gradient. Image gradient. Computational Photography

τ-extrapolation on 3D semi-structured finite element meshes

GPU Implementation of Elliptic Solvers in NWP. Numerical Weather- and Climate- Prediction

Author(s): Title: Journal: ISSN: Year: 2014 Pages: Volume: 25 Issue: 5

Trainlets: Dictionary Learning in High Dimensions

Advanced phase retrieval: maximum likelihood technique with sparse regularization of phase and amplitude

Multi-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation

Modeling Multigrid Algorithms for Variational Imaging

Introduction to Computer Graphics. Modeling (3) April 27, 2017 Kenshi Takayama

Gradient-Based Differential Approach for Patient Motion Compensation in 2D/3D Overlay

AN ALGORITHM FOR BLIND RESTORATION OF BLURRED AND NOISY IMAGES

Bilevel Sparse Coding

Compressive Sensing Applications and Demonstrations: Synthetic Aperture Radar

Sparse Modeling of Graph-Structured Data

Expected Patch Log Likelihood with a Sparse Prior

Performance Optimization of a Massively Parallel Phase-Field Method Using the HPC Framework walberla

Optimization of HOM Couplers using Time Domain Schemes

Quality Guided Image Denoising for Low-Cost Fundus Imaging

A GPU-based High-Performance Library with Application to Nonlinear Water Waves

Real-Time Shape Editing using Radial Basis Functions

Parallel FFT Program Optimizations on Heterogeneous Computers

Survey of the Mathematics of Big Data

B. Tech. Project Second Stage Report on

Filters. Advanced and Special Topics: Filters. Filters

Introduction to Computer Graphics. Image Processing (1) June 8, 2017 Kenshi Takayama

Generating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory

Speedup Altair RADIOSS Solvers Using NVIDIA GPU

Trainlets: Dictionary Learning in High Dimensions

Accelerating GPU computation through mixed-precision methods. Michael Clark Harvard-Smithsonian Center for Astrophysics Harvard University

Image Restoration. Diffusion Denoising Deconvolution Super-resolution Tomographic Reconstruction

SIFT Descriptor Extraction on the GPU for Large-Scale Video Analysis. Hannes Fassold, Jakub Rosner

Multiple-View Object Recognition in Band-Limited Distributed Camera Networks

Why Use the GPU? How to Exploit? New Hardware Features. Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid. Semiconductor trends

Optical Flow Estimation with CUDA. Mikhail Smirnov

Accelerating Double Precision FEM Simulations with GPUs

Exploiting GPU Caches in Sparse Matrix Vector Multiplication. Yusuke Nagasaka Tokyo Institute of Technology

Architecture Aware Multigrid

3D Helmholtz Krylov Solver Preconditioned by a Shifted Laplace Multigrid Method on Multi-GPUs

A GPU-based Approximate SVD Algorithm Blake Foster, Sridhar Mahadevan, Rui Wang

Using GPUs for unstructured grid CFD

Machine Learning / Jan 27, 2010

Denoising an Image by Denoising its Components in a Moving Frame

Adaptive Mesh Astrophysical Fluid Simulations on GPU. San Jose 10/2/2009 Peng Wang, NVIDIA

Outline 7/2/201011/6/

The walberla Framework: Multi-physics Simulations on Heterogeneous Parallel Platforms

Transcription:

Efficient Imaging Algorithms on Many-Core Platforms H. Köstler Dagstuhl, 22.11.2011

Contents Imaging Applications HDR Compression performance of PDE-based models Image Denoising performance of patch-based models Image Deblurring Algorithmic problems Image Segmentation Modelling problems 2

An efficient multigrid solver in action HDR COMPRESSION 3

HDR Compression of 2D X-ray Images Data: Siemens AG, Healthcare Sector Original Image (960x960) HDR Compression 4

HDR Compression The dynamic range of an image refers to the ratio between the brightest and darkest portions of the image which is accurately captured or observed HDR compression is used to get more details out of the image based on»gradient Domain High Dynamic Range Compression«[Fattal/Lischinski/Werman], SIGGRAPH 2002 5

HDR Compression Idea: Modify the magnitude of the image gradient by applying a position-dependent attenuating function Φ : 2 R R C = I Φ Φ is computed on different image resolutions 0..L by φ l = α I l I l α β Φ Φ 0 = φ L L = Φ = P l ( Φ φ l 1 l Φ + ) α determines which gradient magnitudes are left unchanged, β <1 is the attenuating factor of the larger gradients 6

Imaging in Gradient Space Energy functional E( u) Ω = min u( x) C d x Euler-Lagrange equations u 2 2 u = divc = f Solve by multigrid the PDE u = f in Ω u = 0 on Ω 7

General Software Features High dynamic range compression for 2D CT images on GPU/CPU Image histogram computation and windowing CPU/GPU Interactive computation on GPU with user-input parameters Interactive visualization of results with OpenGL on GPU 8

Runtime Distribution for one Frame transfer to device 7% setup RHS, output 10% transfer from device 17% multigrid solver 66% Frame is transfered to gradient space gradients are scaled processed image is restored 9

Frames per second for HDR Compression 160 140 120 100 80 60 fps fps (solver) fps (CPU) 40 20 0 1024x1024 1024x2048 2048x2048 2048x4096 4096x4096 CPU: Intel Core2 Quad Q9550@2.83GHz, OpenMP (4 cores), GPU: GTX 295 10

Optimized HDR Compression 140 120 100 80 60 40 20 0 GTX 295/2 GTX 480 GTX 480 (wavefront) half of an NVIDIA GTX 295 112 GB/s peak bandwidth compute capability 1.3 NVIDIA GTX 480 177 GB/s peak bandwidth compute capability 2.0 (Fermi) fps for HDR compression (size 2048x2048) 11

HDR Compression Results Data: Siemens AG, Healthcare Sector Original Image (960x960) HDR Compression 12

HDR Compression Results Data: Siemens AG, Healthcare Sector Attenuating function Φ 13

Hierarchical Hybrid Grids (HHG) Solve 3D Poisson equation on an unstructured tetrahedral input grid Bey s Tetrahedral Refinement Finite element discretization Patch-wise regular refinement generates nested grid hierarchies naturally suitable for geometric multigrid algorithms 14

Data sets for 3D HDR Compression MRI data provided by Universitätsklinikum Erlangen Tetrahedral finite element mesh used in HHG 15

Strong Scaling for Multigrid Solver on Jugene 140 120 100 Ratio computation : communication is about 3 : 1 80 60 40 20 0 512x512x288 1024x1024x576 2048x2048x1152 Runtime for one V(2,2) cycle in ms Setups: 512x512x288 30 465 215 unknowns = 40% mesh cover (16 patches per direction) 5646 cores 1024x1024x576 201 476 955 unknowns = 33% (32 patches per direction) 37158 cores 2048x2048x1152 1 617 632 955 unknowns = 33% (32 patches per direction) 37158 cores 16

Performance for one V(2,2)-cycle PowerPC 450 Xeon 5550 M1060 C2050 GTX 480 2D const stencil (5p) 798.9 Mu/s 1613 Mu/s 2D variable stencil (5p, complex) 86.2 Mu/s 418.7 Mu/s 3D const stencil (7p) 7.4 Mu/s 26.8 Mu/s 93.2 Mu/s 3D variable stencil (7p) 11.2 Mu/s 32.9 Mu/s 88.3 Mu/s For strong scaling on Jugene (PowerPC 450) we achieve 2,7 Mu/s per node and in total 12,3 Gu/s in HHG! 17

Sparse coding IMAGE DENOISING 18

Image Denoising of 3D CT Volume Data: Siemens AG, Healthcare Sector 19

Noise Model Assumption: Relation between an original, unknown image u : Ω R d R and an observed image u 0 can be expressed by u 0 = u +η where η stands for the noise that is estimated locally. 20

Image Denoising Models Variational approach Requires solution of a nonlinear diffusion-based PDE Done by a multigrid solver Wavelet-based approach Thresholding of coefficients based on noise variance Haar wavelet shows to be most efficient Sparse Coding Image is coded patch-wise by a sparse representation in an overcomplete basis Coefficients are computed by batch-omp algorithm M. Mayer, A. Borsdorf, H. Koestler, J. Hornegger, and U. Ruede, Nonlinear Diffusion vs.wavelet Based Noise Reduction in CT Using Correlation Analysis, VMV 2007 21

Patch-based Image Processing Image processing on many overlapping sub-blocks called patches M. Elad, Sparse and Redundant Representations from Theory to Applications in Signal and Image Processing, Springer, 2010. 22

Sample Dictionaries 23

Sparse Coding Patch x is represented by linear combination of few atoms of overcomplete dictionary D Dictionary D: matrix comprising prototype signal-atoms (extension beyond basis vectors spanning vector space) Find sparsest representation a for x: a = arg min a a 0 subject to Da x 2 2 ε 24

Batch-OMP Algorithm Solve overdetermined linear system while finding the sparsest solution in general NP-hard Efficient Orthogonal Matching Pursuit (OMP): Greedy algorithm, selects atoms sequentially Select atom with highest correlation to the current residual Project signal orthogonally to span of selected atoms More efficient Batch-OMP on GPU : No need to compute residual Progressive Cholesky update instead of full pseudoinverse computation Bartuschat, D. and Stürmer, M., Köstler, H.;An orthogonal matching pursuit algorithm for image denoising on the cell broadband engine, Parallel Processing and Applied Mathematics, 557-566, 2010. 25

Batch-OMP Algorithm Find next atom (3,4,11) Matrix multiplication for initial data (1) substitutions (5-9) projection (10) Error update (12-13) R. Rubinstein, M. Zibulevsky, M. Elad, Efficient implementation of the K- SVD algorithm using batch orthogonal matching pursuit, Technion, 2008. 26

Contribution to overall Batch-OMP Runtime Fermi GPU Cell broadband engine 27

Patches per second for Batch-OMP 28

Performance Batch-OMP for single compute unit 29

Performance Multigrid vs. Batch-OMP To achieve for an image of size 2048 x 2048 the same no. of frames per second (120) as for our optimized multigrid solver we can use 85000 patches if we select only 1 atom 17000 patches if we select 16 atoms In this case we have around 65000 non-overlapping patches of size 8x8 in the image 30

Variational and sparse coding approach IMAGE DEBLURRING 31

Image Deblurring data provided by G. Donnert, MPI Göttingen 32

Image Deblurring Assumption: Image u is blurred (convolved) by PSF or kernel K resulting in blurred image x Ku = x In case of a noise free u the deblurred image is given by u = K 1 x In case of a noisy u we have to take into account (with original image u* and additive noise n) u = u * + n 33

Simple Variational Model for Image Deblurring Energy functional becomes 2 ) E[ u] = ( Ku x + α u Ω 2 2 dx Resulting Euler-Lagrange equations: ( α + K ) u = f in Ω u, n = 0 on Ω Drawback: PSF can have large support! 34

Image Deblurring Results Original image blurred and noisy image deblurred image From: Lou, Y., Bertozzi, A.L., Soatto, S.; Direct sparse deblurring, Journal of Mathematical Imaging and Vision, pp. 1-12, 2011 35

Deblurring by Sparse Coding Idea: Use blurred dictionary D = KD Ku = x u Da KDa = D' a x Compute coefficients a with respect to D, but then restore the deblurred image by using D a = arg min a a subject to ' 0 D a x 2 2 ε No inverse problem has to be solved! 36

Image Deblurring Results 37

Deblurring by Sparse Coding Open problems: Patch boundaries Best dictionary learned from original, non-blurred data 38

Muscle fibres IMAGE SEGMENTATION 39

Image Segmentation Goal: Extract fibres from structural images of a mouse muscle obtained from extended volume imaging. Data provided by O. Röhrle, Universität Stuttgart from Dane Gerneke, Auckland Bioengineering Institute at the University of Auckland, New Zealand 40

Image Segmentation 41

Segmentation Process The following steps are employed during the segmentation process: Step 1: Pre-filtering of raw image data Step 2: Circle detection as initial rough approximation to the shape of a fibre Step 3: Finding the final contours of the muscle fibres by the method of active contours Step 4: Post-processing O. Roehrle, H. Koestler, and M. Loch, Segmentation of skeletal muscle fibers for applications in computational skeletal muscle mechanics, Computational Biomechanics for Medicine, Springer, 2011. 42

Automatic segmentation result 43

Future Work and Future Topics Image Deblurring Explore different dictionaries Solve boundary problems with ideas from domain decomposition Image Segmentation Include geometric/shape information in model 44