A parallel patch based algorithm for CT image denoising on the Cell Broadband Engine

Size: px
Start display at page:

Download "A parallel patch based algorithm for CT image denoising on the Cell Broadband Engine"

Transcription

1 A parallel patch based algorithm for CT image denoising on the Cell Broadband Engine Dominik Bartuschat, Markus Stürmer, Harald Köstler and Ulrich Rüde Friedrich-Alexander Universität Erlangen-Nürnberg,Germany October 28, 2009 Dominik Bartuschat Chair for System Simulation Page 1/22

2 Contents 1 Motivation 2 Method 3 Cell implementation 4 Results Dominik Bartuschat Chair for System Simulation Page 2/22

3 CT image acquisition CT scanner acquires projection data Original object is reconstructed from them by filtered back projection Resulting image exhibits different kinds of (unknown) noise Figure: CT geometry Dominik Bartuschat Chair for System Simulation Page 3/22

4 CT image denoising Denoising helps to reduce high radiation dose for CT by increasing the signal to noise ratio Keep medical relevant information! Noise in CT images is instationary and distribution unknown (a) CT image (b) Noise Images provided by Anja Borsdorf, Siemens AG - Healthcare Sector Adaption to spatially changing noise behavior [BRH08] estimating noise distribution from two spatially identical input images with uncorrelated noise Dominik Bartuschat Chair for System Simulation Page 4/22

5 Denoising using K-SVD Decomposition of image into overlapping patches Sparse representation of patches linear combination of few noise - free dictionary atoms Dictionary is trained on noisy image itself without learning (gaussian) noise Dominik Bartuschat Chair for System Simulation Page 5/22

6 CT image denoising on Cell CT image denoising should be as fast as possible (realtime) Patches do not depend on each other trivially parallelizable Implementation on Cell Broadband Engine Architecture Heterogeneous multicore distributed memory processor Comprising PowerPC Processor Element (PPE) and Synergistic Processor Elements (SPEs) From projects.nsf/pages/multicore.cellbe.html Dominik Bartuschat Chair for System Simulation Page 6/22

7 1 Motivation 2 Method 3 Cell implementation 4 Results Dominik Bartuschat Chair for System Simulation Page 7/22

8 Theory: Sparse Representations Remember: Any vector x R n can be represented by a linear combination of n basis vectors that span a vector space Idea: Use more than n basis vectors (atoms) x can be represented by linear combination of only few atoms The set of prototype signal-atoms is an overcomplete dictionary D Find sparsest representation for x: â = argmin a 0 subject to Da x 2 2 ɛ a Dominik Bartuschat Chair for System Simulation Page 8/22

9 Sparse Coding with Batch OMP Solve overdetermined linear system while finding the sparsest solution in general NP-hard Efficient Orthogonal Matching Pursuit (OMP): Greedy algorithm, selects atoms sequentially Select atom with highest correlation to the current residual r ˆk = argmax dk T r, I = k I ˆk Orthogonalization: Project signal orthogonally to span of selected atoms and recompute residual r = x D I D + I x Dominik Bartuschat Chair for System Simulation Page 9/22

10 Sparse Coding with Batch OMP Solve overdetermined linear system while finding the sparsest solution in general NP-hard Efficient Orthogonal Matching Pursuit (OMP): Greedy algorithm, selects atoms sequentially Select atom with highest correlation to the current residual r ˆk = argmax dk T r, I = k I ˆk Orthogonalization: Project signal orthogonally to span of selected atoms and recompute residual r = x D I D + I x More efficient Batch OMP (R. Rubinstein, M. Elad): no need to compute r, only product with D T (precompute G = D T D) p = D T r = p 0 G I (D T I D I ) 1 D T I x Dominik Bartuschat Chair for System Simulation Page 9/22

11 Sparse Coding with Batch OMP Solve overdetermined linear system while finding the sparsest solution in general NP-hard Efficient Orthogonal Matching Pursuit (OMP): Greedy algorithm, selects atoms sequentially Select atom with highest correlation to the current residual r ˆk = argmax dk T r, I = k I ˆk Orthogonalization: Project signal orthogonally to span of selected atoms and recompute residual r = x D I D + I x More efficient Batch OMP (R. Rubinstein, M. Elad): no need to compute r, only product with D T (precompute G = D T D) p = D T r = p 0 G I (D T I D I ) 1 D T I x Instead of full Pseudoinverse computation computation of Progressive Cholesky Update Dominik Bartuschat Chair for System Simulation Page 9/22

12 K SVD algorithm for image denoising Computation of the denoised image ˆX from the noisy image Y and training of the dictionary D [EA06]: ˆX = argmin{λ Y X X,D,A ij µ ij a ij 0 + ij Da ij R ij X 2 2} Figure: K SVD algorithm overview, adapted from presentation of M.Elad Dominik Bartuschat Chair for System Simulation Page 10/22

13 Extension for CT-Image Denoising Computation of the denoised image ˆX from the noisy image Y and training of the dictionary D [EA06]: ˆX = argmin{λ Y X ij µ ij a ij 0 + X,D,A ij Da ij R ij X 2 2} Estimated local noise variance V as error tolerance for Batch OMP: ij min a ij a ij 0 s.th. Da ij R ij Y C V (R ij Y ) with different C for dictionary training (C Train ) and image denoising (C Den ) Weight λ ij for denoised and noisy average image, dependent on Variance λ (V (R ij Y )): ˆX = min { X ij λ ij R ij Y R ij X ij Da ij R ij Y 2 2} Dominik Bartuschat Chair for System Simulation Page 11/22

14 1 Motivation 2 Method 3 Cell implementation 4 Results Dominik Bartuschat Chair for System Simulation Page 12/22

15 Cell - Parallelization and Performance For high performance, the following kinds of parallelism must be exploited Thread-level parallelism: independent SPE threads can work concurrently on different data and tasks Data-level parallelism: vector processing with SIMD, controllable by intrinsics Parallelization of data transfer and computations: Programmer has to explicitly transfer data and instructions to local storage of SPEs can use buffering to overlap transfers with computations Dominik Bartuschat Chair for System Simulation Page 13/22

16 Thread Parallelism Unknown a priori, how long SPE thread needs to denoise patches Dynamically distribute work among SPE threads Synchronize threads by means of an atomic counter Assign one stripe of noisy image to each SPE thread Having denoised stripe, SPE increments atomic counter and denoises next stripe Each SPE has own space in main memory to which denoised patches are transferred no synchronization for writing Dominik Bartuschat Chair for System Simulation Page 14/22

17 Data Transfer and Cache Blocking Transfer image data blocks from stripe to buffer in local storage patches to be transferred to local storage have to be aligned in memory SPE extracts patches from buffers (byte shuffling, SIMD), performs Batch OMP and reconstructs patches Two buffers are needed at once to extract overlapping patches Multibuffering is used to overlap computations and data transfer Dominik Bartuschat Chair for System Simulation Page 15/22

18 Optimizations on SPEs Difficulties for Batch OMP algorithm on Cell: Cholesky matrix increases by one row in each iteration SIMD vectorization not trivial Only subset of atoms is chosen Gather-operations in vectors and matrices Only small but increasing amount of coefficients loops with short bodies and few iterations are inefficient, due to in-order execution and high branch miss penalty Dominik Bartuschat Chair for System Simulation Page 16/22

19 Optimizations on SPEs Difficulties for Batch OMP algorithm on Cell: Cholesky matrix increases by one row in each iteration SIMD vectorization not trivial Only subset of atoms is chosen Gather-operations in vectors and matrices Only small but increasing amount of coefficients loops with short bodies and few iterations are inefficient, due to in-order execution and high branch miss penalty Applied techniques to increase performance: Dictionary size restricted to multiples of 8 and fixed at compile time, together with maximum number of coefficients simplifies address calculation SIMD vectorization and unrolling of loops Register blocking to reduce load and store operations Dominik Bartuschat Chair for System Simulation Page 16/22

20 1 Motivation 2 Method 3 Cell implementation 4 Results Dominik Bartuschat Chair for System Simulation Page 17/22

21 Performance Comparisons Performance for denoising a 512 x 512 image for patches of size 8 x 8 with C Den = 20 (mostly 4 atoms) Denoising of patches on QS22, superposition and computations of weights for each SPE separately. SPEs Runtime [ms] Parallel efficiency [%] Denoising of patches on different architectures (computation of one final superposition and weights image). Cores / SPEs Runtime [s] on QS Runtime [s] on Nehalem Runtime [s] on Penryn QS22: BladeCenter QS22, contains two PowerXCell 8i 3.2 GHz Nehalem: Intel Core i7 CPU 2.93GHz Penryn: Intel Core 2 Quad CPU 2.83GHz Dominik Bartuschat Chair for System Simulation Page 18/22

22 Denoising by K-SVD (a) K-SVD 2D original (b) K-SVD 2D denoised Image provided by Anja Borsdorf, Siemens AG - Healthcare Sector Dominik Bartuschat Chair for System Simulation Page 19/22

23 Denoising by K SVD (a) Original, provided by A. Borsdorf, Siemens AG - Healthcare Sector (b) K-SVD 2D (c) K-SVD 2D Original Dominik Bartuschat Chair for System Simulation Page 20/22

24 Conclusion We optimized Batch-OMP algorithm on Cell BE and applied it to CT image denoising Denoising algorithm preserves edges and adapts to spatially varying noise Parallel efficiency of SPE computations more than 97% on Playstation3 and 87% on QS22 Optimized version of denoising on Cell more than 6 times higher per-core performance than not hand-optimized (but well-written) OpenMP implementation on latest multicore architectures Sequential computations on PPE cause performance stagnation (and even increasing runtime) Dominik Bartuschat Chair for System Simulation Page 21/22

25 Related Literature [BRH08] A. Borsdorf, R. Raupach, and J. Hornegger. Multiple CT-reconstructions for locally adaptive anisotropic wavelet denoising. International Journal of Computer Assisted Radiology and Surgery, 2(5): , [EA06] [Gsc07] [RZE08] M. Elad and M. Aharon. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans. Image Process, 15(12): , M. Gschwind. The Cell Broadband Engine: Exploiting Multiple Levels of Parallelism in a Chip Multiprocessor. International Journal of Parallel Programming, 35(3): , R. Rubinstein, M. Zibulevsky, and M. Elad. Efficient Implementation of the K-SVD Algorithm and the Batch-OMP Method. ronrubin, Technical Report CS Technion, Dominik Bartuschat Chair for System Simulation Page 22/22

Efficient Imaging Algorithms on Many-Core Platforms

Efficient Imaging Algorithms on Many-Core Platforms Efficient Imaging Algorithms on Many-Core Platforms H. Köstler Dagstuhl, 22.11.2011 Contents Imaging Applications HDR Compression performance of PDE-based models Image Denoising performance of patch-based

More information

FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG INSTITUT FÜR INFORMATIK (MATHEMATISCHE MASCHINEN UND DATENVERARBEITUNG)

FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG INSTITUT FÜR INFORMATIK (MATHEMATISCHE MASCHINEN UND DATENVERARBEITUNG) FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG INSTITUT FÜR INFORMATIK (MATHEMATISCHE MASCHINEN UND DATENVERARBEITUNG) Lehrstuhl für Informatik 10 (Systemsimulation) A parallel K-SVD implementation

More information

Geometric Multigrid on Multicore Architectures: Performance-Optimized Complex Diffusion

Geometric Multigrid on Multicore Architectures: Performance-Optimized Complex Diffusion Geometric Multigrid on Multicore Architectures: Performance-Optimized Complex Diffusion M. Stürmer, H. Köstler, and U. Rüde Lehrstuhl für Systemsimulation Friedrich-Alexander-Universität Erlangen-Nürnberg

More information

Efficient Implementation of the K-SVD Algorithm and the Batch-OMP Method

Efficient Implementation of the K-SVD Algorithm and the Batch-OMP Method Efficient Implementation of the K-SVD Algorithm and the Batch-OMP Method Ron Rubinstein, Michael Zibulevsky and Michael Elad Abstract The K-SVD algorithm is a highly effective method of training overcomplete

More information

Efficient Implementation of the K-SVD Algorithm using Batch Orthogonal Matching Pursuit

Efficient Implementation of the K-SVD Algorithm using Batch Orthogonal Matching Pursuit Efficient Implementation of the K-SVD Algorithm using Batch Orthogonal Matching Pursuit Ron Rubinstein, Michael Zibulevsky and Michael Elad Abstract The K-SVD algorithm is a highly effective method of

More information

Separate CT-Reconstruction for Orientation and Position Adaptive Wavelet Denoising

Separate CT-Reconstruction for Orientation and Position Adaptive Wavelet Denoising Separate CT-Reconstruction for Orientation and Position Adaptive Wavelet Denoising Anja Borsdorf 1,, Rainer Raupach, Joachim Hornegger 1 1 Chair for Pattern Recognition, Friedrich-Alexander-University

More information

COSC 6385 Computer Architecture - Data Level Parallelism (III) The Intel Larrabee, Intel Xeon Phi and IBM Cell processors

COSC 6385 Computer Architecture - Data Level Parallelism (III) The Intel Larrabee, Intel Xeon Phi and IBM Cell processors COSC 6385 Computer Architecture - Data Level Parallelism (III) The Intel Larrabee, Intel Xeon Phi and IBM Cell processors Edgar Gabriel Fall 2018 References Intel Larrabee: [1] L. Seiler, D. Carmean, E.

More information

Sparse & Redundant Representations and Their Applications in Signal and Image Processing

Sparse & Redundant Representations and Their Applications in Signal and Image Processing Sparse & Redundant Representations and Their Applications in Signal and Image Processing Sparseland: An Estimation Point of View Michael Elad The Computer Science Department The Technion Israel Institute

More information

CellSs Making it easier to program the Cell Broadband Engine processor

CellSs Making it easier to program the Cell Broadband Engine processor Perez, Bellens, Badia, and Labarta CellSs Making it easier to program the Cell Broadband Engine processor Presented by: Mujahed Eleyat Outline Motivation Architecture of the cell processor Challenges of

More information

Edge-Preserving Denoising for Segmentation in CT-Images

Edge-Preserving Denoising for Segmentation in CT-Images Edge-Preserving Denoising for Segmentation in CT-Images Eva Eibenberger, Anja Borsdorf, Andreas Wimmer, Joachim Hornegger Lehrstuhl für Mustererkennung, Friedrich-Alexander-Universität Erlangen-Nürnberg

More information

Reconstruction of Trees from Laser Scan Data and further Simulation Topics

Reconstruction of Trees from Laser Scan Data and further Simulation Topics Reconstruction of Trees from Laser Scan Data and further Simulation Topics Helmholtz-Research Center, Munich Daniel Ritter http://www10.informatik.uni-erlangen.de Overview 1. Introduction of the Chair

More information

Evaluating the Portability of UPC to the Cell Broadband Engine

Evaluating the Portability of UPC to the Cell Broadband Engine Evaluating the Portability of UPC to the Cell Broadband Engine Dipl. Inform. Ruben Niederhagen JSC Cell Meeting CHAIR FOR OPERATING SYSTEMS Outline Introduction UPC Cell UPC on Cell Mapping Compiler and

More information

EFFICIENT DICTIONARY LEARNING IMPLEMENTATION ON THE GPU USING OPENCL

EFFICIENT DICTIONARY LEARNING IMPLEMENTATION ON THE GPU USING OPENCL U.P.B. Sci. Bull., Series C, Vol. 78, Iss. 3, 2016 ISSN 2286-3540 EFFICIENT DICTIONARY LEARNING IMPLEMENTATION ON THE GPU USING OPENCL Paul Irofti 1 Abstract The dictionary learning field offers a wide

More information

Signal Reconstruction from Sparse Representations: An Introdu. Sensing

Signal Reconstruction from Sparse Representations: An Introdu. Sensing Signal Reconstruction from Sparse Representations: An Introduction to Compressed Sensing December 18, 2009 Digital Data Acquisition Suppose we want to acquire some real world signal digitally. Applications

More information

Cell Processor and Playstation 3

Cell Processor and Playstation 3 Cell Processor and Playstation 3 Guillem Borrell i Nogueras February 24, 2009 Cell systems Bad news More bad news Good news Q&A IBM Blades QS21 Cell BE based. 8 SPE 460 Gflops Float 20 GFLops Double QS22

More information

Image Denoising Using Sparse Representations

Image Denoising Using Sparse Representations Image Denoising Using Sparse Representations SeyyedMajid Valiollahzadeh 1,,HamedFirouzi 1, Massoud Babaie-Zadeh 1, and Christian Jutten 2 1 Department of Electrical Engineering, Sharif University of Technology,

More information

Depth-Layer-Based Patient Motion Compensation for the Overlay of 3D Volumes onto X-Ray Sequences

Depth-Layer-Based Patient Motion Compensation for the Overlay of 3D Volumes onto X-Ray Sequences Depth-Layer-Based Patient Motion Compensation for the Overlay of 3D Volumes onto X-Ray Sequences Jian Wang 1,2, Anja Borsdorf 2, Joachim Hornegger 1,3 1 Pattern Recognition Lab, Friedrich-Alexander-Universität

More information

Department of Electronics and Communication KMP College of Engineering, Perumbavoor, Kerala, India 1 2

Department of Electronics and Communication KMP College of Engineering, Perumbavoor, Kerala, India 1 2 Vol.3, Issue 3, 2015, Page.1115-1021 Effect of Anti-Forensics and Dic.TV Method for Reducing Artifact in JPEG Decompression 1 Deepthy Mohan, 2 Sreejith.H 1 PG Scholar, 2 Assistant Professor Department

More information

IBM Cell Processor. Gilbert Hendry Mark Kretschmann

IBM Cell Processor. Gilbert Hendry Mark Kretschmann IBM Cell Processor Gilbert Hendry Mark Kretschmann Architectural components Architectural security Programming Models Compiler Applications Performance Power and Cost Conclusion Outline Cell Architecture:

More information

Greedy algorithms for Sparse Dictionary Learning

Greedy algorithms for Sparse Dictionary Learning Greedy algorithms for Sparse Dictionary Learning Varun Joshi 26 Apr 2017 Background. Sparse dictionary learning is a kind of representation learning where we express the data as a sparse linear combination

More information

Multigrid algorithms on multi-gpu architectures

Multigrid algorithms on multi-gpu architectures Multigrid algorithms on multi-gpu architectures H. Köstler European Multi-Grid Conference EMG 2010 Isola d Ischia, Italy 20.9.2010 2 Contents Work @ LSS GPU Architectures and Programming Paradigms Applications

More information

How to Optimize Geometric Multigrid Methods on GPUs

How to Optimize Geometric Multigrid Methods on GPUs How to Optimize Geometric Multigrid Methods on GPUs Markus Stürmer, Harald Köstler, Ulrich Rüde System Simulation Group University Erlangen March 31st 2011 at Copper Schedule motivation imaging in gradient

More information

Learning Splines for Sparse Tomographic Reconstruction. Elham Sakhaee and Alireza Entezari University of Florida

Learning Splines for Sparse Tomographic Reconstruction. Elham Sakhaee and Alireza Entezari University of Florida Learning Splines for Sparse Tomographic Reconstruction Elham Sakhaee and Alireza Entezari University of Florida esakhaee@cise.ufl.edu 2 Tomographic Reconstruction Recover the image given X-ray measurements

More information

Large scale Imaging on Current Many- Core Platforms

Large scale Imaging on Current Many- Core Platforms Large scale Imaging on Current Many- Core Platforms SIAM Conf. on Imaging Science 2012 May 20, 2012 Dr. Harald Köstler Chair for System Simulation Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen,

More information

ELEG Compressive Sensing and Sparse Signal Representations

ELEG Compressive Sensing and Sparse Signal Representations ELEG 867 - Compressive Sensing and Sparse Signal Representations Gonzalo R. Arce Depart. of Electrical and Computer Engineering University of Delaware Fall 211 Compressive Sensing G. Arce Fall, 211 1 /

More information

Image Restoration and Background Separation Using Sparse Representation Framework

Image Restoration and Background Separation Using Sparse Representation Framework Image Restoration and Background Separation Using Sparse Representation Framework Liu, Shikun Abstract In this paper, we introduce patch-based PCA denoising and k-svd dictionary learning method for the

More information

Numerical Algorithms on Multi-GPU Architectures

Numerical Algorithms on Multi-GPU Architectures Numerical Algorithms on Multi-GPU Architectures Dr.-Ing. Harald Köstler 2 nd International Workshops on Advances in Computational Mechanics Yokohama, Japan 30.3.2010 2 3 Contents Motivation: Applications

More information

Expected Patch Log Likelihood with a Sparse Prior

Expected Patch Log Likelihood with a Sparse Prior Expected Patch Log Likelihood with a Sparse Prior Jeremias Sulam and Michael Elad Computer Science Department, Technion, Israel {jsulam,elad}@cs.technion.ac.il Abstract. Image priors are of great importance

More information

Adaptive Image Compression Using Sparse Dictionaries

Adaptive Image Compression Using Sparse Dictionaries Adaptive Image Compression Using Sparse Dictionaries Inbal Horev, Ori Bryt and Ron Rubinstein Abstract Transform-based coding is a widely used image compression technique, where entropy reduction is achieved

More information

IMA Preprint Series # 2211

IMA Preprint Series # 2211 LEARNING TO SENSE SPARSE SIGNALS: SIMULTANEOUS SENSING MATRIX AND SPARSIFYING DICTIONARY OPTIMIZATION By Julio Martin Duarte-Carvajalino and Guillermo Sapiro IMA Preprint Series # 2211 ( May 2008 ) INSTITUTE

More information

29 th NATIONAL RADIO SCIENCE CONFERENCE (NRSC 2012) April 10 12, 2012, Faculty of Engineering/Cairo University, Egypt

29 th NATIONAL RADIO SCIENCE CONFERENCE (NRSC 2012) April 10 12, 2012, Faculty of Engineering/Cairo University, Egypt K1. High Performance Compressed Sensing MRI Image Reconstruction Ahmed Abdel Salam, Fadwa Fawzy, Norhan Shaker, Yasser M.Kadah Biomedical Engineering, Cairo University, Cairo, Egypt, ymk@k-space.org Computer

More information

Compressed Sensing and Applications by using Dictionaries in Image Processing

Compressed Sensing and Applications by using Dictionaries in Image Processing Advances in Computational Sciences and Technology ISSN 0973-6107 Volume 10, Number 2 (2017) pp. 165-170 Research India Publications http://www.ripublication.com Compressed Sensing and Applications by using

More information

Image Denoising via Group Sparse Eigenvectors of Graph Laplacian

Image Denoising via Group Sparse Eigenvectors of Graph Laplacian Image Denoising via Group Sparse Eigenvectors of Graph Laplacian Yibin Tang, Ying Chen, Ning Xu, Aimin Jiang, Lin Zhou College of IOT Engineering, Hohai University, Changzhou, China School of Information

More information

Image Deblurring Using Adaptive Sparse Domain Selection and Adaptive Regularization

Image Deblurring Using Adaptive Sparse Domain Selection and Adaptive Regularization Volume 3, No. 3, May-June 2012 International Journal of Advanced Research in Computer Science RESEARCH PAPER Available Online at www.ijarcs.info ISSN No. 0976-5697 Image Deblurring Using Adaptive Sparse

More information

On Single Image Scale-Up using Sparse-Representation

On Single Image Scale-Up using Sparse-Representation On Single Image Scale-Up using Sparse-Representation Roman Zeyde, Matan Protter and Michael Elad The Computer Science Department Technion Israel Institute of Technology Haifa 32000, Israel {romanz,matanpr,elad}@cs.technion.ac.il

More information

EMERGENCE OF SIMPLE-CELL RECEPTIVE PROPERTIES BY LEARNING A SPARSE CODE FOR NATURAL IMAGES

EMERGENCE OF SIMPLE-CELL RECEPTIVE PROPERTIES BY LEARNING A SPARSE CODE FOR NATURAL IMAGES EMERGENCE OF SIMPLE-CELL RECEPTIVE PROPERTIES BY LEARNING A SPARSE CODE FOR NATURAL IMAGES Bruno A. Olshausen & David J. Field Presenter: Ozgur Yigit Balkan Outline Linear data representations Sparse Vector

More information

Software and Performance Engineering for numerical codes on GPU clusters

Software and Performance Engineering for numerical codes on GPU clusters Software and Performance Engineering for numerical codes on GPU clusters H. Köstler International Workshop of GPU Solutions to Multiscale Problems in Science and Engineering Harbin, China 28.7.2010 2 3

More information

Accelerating the Implicit Integration of Stiff Chemical Systems with Emerging Multi-core Technologies

Accelerating the Implicit Integration of Stiff Chemical Systems with Emerging Multi-core Technologies Accelerating the Implicit Integration of Stiff Chemical Systems with Emerging Multi-core Technologies John C. Linford John Michalakes Manish Vachharajani Adrian Sandu IMAGe TOY 2009 Workshop 2 Virginia

More information

High Performance Computing. University questions with solution

High Performance Computing. University questions with solution High Performance Computing University questions with solution Q1) Explain the basic working principle of VLIW processor. (6 marks) The following points are basic working principle of VLIW processor. The

More information

IMA Preprint Series # 2168

IMA Preprint Series # 2168 LEARNING MULTISCALE SPARSE REPRESENTATIONS FOR IMAGE AND VIDEO RESTORATION By Julien Mairal Guillermo Sapiro and Michael Elad IMA Preprint Series # 2168 ( July 2007 ) INSTITUTE FOR MATHEMATICS AND ITS

More information

Thread Tailor Dynamically Weaving Threads Together for Efficient, Adaptive Parallel Applications

Thread Tailor Dynamically Weaving Threads Together for Efficient, Adaptive Parallel Applications Thread Tailor Dynamically Weaving Threads Together for Efficient, Adaptive Parallel Applications Janghaeng Lee, Haicheng Wu, Madhumitha Ravichandran, Nathan Clark Motivation Hardware Trends Put more cores

More information

Image Denoising Via Learned Dictionaries and Sparse representation

Image Denoising Via Learned Dictionaries and Sparse representation Image Denoising Via Learned Dictionaries and Sparse representation Michael Elad Michal Aharon Department of Computer Science The Technion - Israel Institute of Technology, Haifa 32 Israel Abstract We address

More information

Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations

Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations Mingyuan Zhou, Haojun Chen, John Paisley, Lu Ren, 1 Guillermo Sapiro and Lawrence Carin Department of Electrical and Computer

More information

Parallel Computing: Parallel Architectures Jin, Hai

Parallel Computing: Parallel Architectures Jin, Hai Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer

More information

How to Write Fast Code , spring th Lecture, Mar. 31 st

How to Write Fast Code , spring th Lecture, Mar. 31 st How to Write Fast Code 18-645, spring 2008 20 th Lecture, Mar. 31 st Instructor: Markus Püschel TAs: Srinivas Chellappa (Vas) and Frédéric de Mesmay (Fred) Introduction Parallelism: definition Carrying

More information

PATCH-DISAGREEMENT AS A WAY TO IMPROVE K-SVD DENOISING

PATCH-DISAGREEMENT AS A WAY TO IMPROVE K-SVD DENOISING PATCH-DISAGREEMENT AS A WAY TO IMPROVE K-SVD DENOISING Yaniv Romano Department of Electrical Engineering Technion, Haifa 32000, Israel yromano@tx.technion.ac.il Michael Elad Department of Computer Science

More information

Blind Compressed Sensing Using Sparsifying Transforms

Blind Compressed Sensing Using Sparsifying Transforms Blind Compressed Sensing Using Sparsifying Transforms Saiprasad Ravishankar and Yoram Bresler Department of Electrical and Computer Engineering and Coordinated Science Laboratory University of Illinois

More information

Linear Methods for Regression and Shrinkage Methods

Linear Methods for Regression and Shrinkage Methods Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors

More information

ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING 16 MARKS CS 2354 ADVANCE COMPUTER ARCHITECTURE 1. Explain the concepts and challenges of Instruction-Level Parallelism. Define

More information

A Parallel Algorithm for Compressive Sensing

A Parallel Algorithm for Compressive Sensing A Parallel Algorithm for ompressive Sensing Alexandre Borghi Laboratoire de Recherche en Informatique Universite Paris-Sud 11 July 6 2011 A Parallel Algorithm for ompressive Sensing 1 Idea of ompressive

More information

ADVANCED RECONSTRUCTION FOR ELECTRON MICROSCOPY

ADVANCED RECONSTRUCTION FOR ELECTRON MICROSCOPY 1 ADVANCED RECONSTRUCTION FOR ELECTRON MICROSCOPY SUHAS SREEHARI S. V. VENKATAKRISHNAN (VENKAT) CHARLES A. BOUMAN PURDUE UNIVERSITY AUGUST 15, 2014 2 OUTLINE 1. Overview of MBIR 2. Previous work 3. Leading

More information

Massively Parallel Phase Field Simulations using HPC Framework walberla

Massively Parallel Phase Field Simulations using HPC Framework walberla Massively Parallel Phase Field Simulations using HPC Framework walberla SIAM CSE 2015, March 15 th 2015 Martin Bauer, Florian Schornbaum, Christian Godenschwager, Johannes Hötzer, Harald Köstler and Ulrich

More information

Adaptive Reconstruction Methods for Low-Dose Computed Tomography

Adaptive Reconstruction Methods for Low-Dose Computed Tomography Adaptive Reconstruction Methods for Low-Dose Computed Tomography Joseph Shtok Ph.D. supervisors: Prof. Michael Elad, Dr. Michael Zibulevsky. Technion IIT, Israel, 011 Ph.D. Talk, Apr. 01 Contents of this

More information

Sparse learned representations for image restoration

Sparse learned representations for image restoration IASC2008: December 5-8, Yokohama, Japan Sparse learned representations for image restoration Julien Mairal 1 Michael Elad 2 Guillermo Sapiro 3 1 INRIA, Paris-Rocquencourt, Willow project - INRIA/ENS/CNRS

More information

high performance medical reconstruction using stream programming paradigms

high performance medical reconstruction using stream programming paradigms high performance medical reconstruction using stream programming paradigms This Paper describes the implementation and results of CT reconstruction using Filtered Back Projection on various stream programming

More information

Trainlets: Dictionary Learning in High Dimensions

Trainlets: Dictionary Learning in High Dimensions 1 Trainlets: Dictionary Learning in High Dimensions Jeremias Sulam, Student Member, IEEE, Boaz Ophir, Michael Zibulevsky, and Michael Elad, Fellow, IEEE Abstract Sparse representation has shown to be a

More information

Image Reconstruction from Multiple Sparse Representations

Image Reconstruction from Multiple Sparse Representations Image Reconstruction from Multiple Sparse Representations Robert Crandall Advisor: Professor Ali Bilgin University of Arizona Program in Applied Mathematics 617 N. Santa Rita, Tucson, AZ 85719 Abstract

More information

Iterative CT Reconstruction Using Curvelet-Based Regularization

Iterative CT Reconstruction Using Curvelet-Based Regularization Iterative CT Reconstruction Using Curvelet-Based Regularization Haibo Wu 1,2, Andreas Maier 1, Joachim Hornegger 1,2 1 Pattern Recognition Lab (LME), Department of Computer Science, 2 Graduate School in

More information

GEOMETRIC MANIFOLD APPROXIMATION USING LOCALLY LINEAR APPROXIMATIONS

GEOMETRIC MANIFOLD APPROXIMATION USING LOCALLY LINEAR APPROXIMATIONS GEOMETRIC MANIFOLD APPROXIMATION USING LOCALLY LINEAR APPROXIMATIONS BY TALAL AHMED A thesis submitted to the Graduate School New Brunswick Rutgers, The State University of New Jersey in partial fulfillment

More information

COMPARISON BETWEEN K_SVD AND OTHER FILTERING TECHNIQUE

COMPARISON BETWEEN K_SVD AND OTHER FILTERING TECHNIQUE COMPARISON BETWEEN K_SVD AND OTHER FILTERING TECHNIQUE Anuj Kumar Patro Manini Monalisa Pradhan Gyana Ranjan Mati Swasti Dash Abstract The field of image de-noising sometimes referred to as image deblurring

More information

Union of Learned Sparsifying Transforms Based Low-Dose 3D CT Image Reconstruction

Union of Learned Sparsifying Transforms Based Low-Dose 3D CT Image Reconstruction Union of Learned Sparsifying Transforms Based Low-Dose 3D CT Image Reconstruction Xuehang Zheng 1, Saiprasad Ravishankar 2, Yong Long 1, Jeff Fessler 2 1 University of Michigan - Shanghai Jiao Tong University

More information

Inverse Problems and Machine Learning

Inverse Problems and Machine Learning Inverse Problems and Machine Learning Julian Wörmann Research Group for Geometric Optimization and Machine Learning (GOL) 1 What are inverse problems? 2 Inverse Problems cause/ excitation 3 Inverse Problems

More information

Modified Iterative Method for Recovery of Sparse Multiple Measurement Problems

Modified Iterative Method for Recovery of Sparse Multiple Measurement Problems Journal of Electrical Engineering 6 (2018) 124-128 doi: 10.17265/2328-2223/2018.02.009 D DAVID PUBLISHING Modified Iterative Method for Recovery of Sparse Multiple Measurement Problems Sina Mortazavi and

More information

Generalized Tree-Based Wavelet Transform and Applications to Patch-Based Image Processing

Generalized Tree-Based Wavelet Transform and Applications to Patch-Based Image Processing Generalized Tree-Based Wavelet Transform and * Michael Elad The Computer Science Department The Technion Israel Institute of technology Haifa 32000, Israel *Joint work with A Seminar in the Hebrew University

More information

Gradient-Based Differential Approach for Patient Motion Compensation in 2D/3D Overlay

Gradient-Based Differential Approach for Patient Motion Compensation in 2D/3D Overlay Gradient-Based Differential Approach for Patient Motion Compensation in 2D/3D Overlay Jian Wang, Anja Borsdorf, Benno Heigl, Thomas Köhler, Joachim Hornegger Pattern Recognition Lab, Friedrich-Alexander-University

More information

Non-Stationary CT Image Noise Spectrum Analysis

Non-Stationary CT Image Noise Spectrum Analysis Non-Stationary CT Image Noise Spectrum Analysis Michael Balda, Björn J. Heismann,, Joachim Hornegger Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen Siemens Healthcare, Erlangen michael.balda@informatik.uni-erlangen.de

More information

Robot Mapping. Least Squares Approach to SLAM. Cyrill Stachniss

Robot Mapping. Least Squares Approach to SLAM. Cyrill Stachniss Robot Mapping Least Squares Approach to SLAM Cyrill Stachniss 1 Three Main SLAM Paradigms Kalman filter Particle filter Graphbased least squares approach to SLAM 2 Least Squares in General Approach for

More information

Graphbased. Kalman filter. Particle filter. Three Main SLAM Paradigms. Robot Mapping. Least Squares Approach to SLAM. Least Squares in General

Graphbased. Kalman filter. Particle filter. Three Main SLAM Paradigms. Robot Mapping. Least Squares Approach to SLAM. Least Squares in General Robot Mapping Three Main SLAM Paradigms Least Squares Approach to SLAM Kalman filter Particle filter Graphbased Cyrill Stachniss least squares approach to SLAM 1 2 Least Squares in General! Approach for

More information

Double-precision General Matrix Multiply (DGEMM)

Double-precision General Matrix Multiply (DGEMM) Double-precision General Matrix Multiply (DGEMM) Parallel Computation (CSE 0), Assignment Andrew Conegliano (A0) Matthias Springer (A00) GID G-- January, 0 0. Assumptions The following assumptions apply

More information

Sparse Reconstruction / Compressive Sensing

Sparse Reconstruction / Compressive Sensing Sparse Reconstruction / Compressive Sensing Namrata Vaswani Department of Electrical and Computer Engineering Iowa State University Namrata Vaswani Sparse Reconstruction / Compressive Sensing 1/ 20 The

More information

Sparse & Redundant Representation Modeling of Images: Theory and Applications

Sparse & Redundant Representation Modeling of Images: Theory and Applications Sparse & Redundant Representation Modeling of Images: Theory and Applications Michael Elad The Computer Science Department The Technion Haifa 3, Israel April 16 th 1, EE Seminar This research was supported

More information

Parallel Computing. Hwansoo Han (SKKU)

Parallel Computing. Hwansoo Han (SKKU) Parallel Computing Hwansoo Han (SKKU) Unicore Limitations Performance scaling stopped due to Power consumption Wire delay DRAM latency Limitation in ILP 10000 SPEC CINT2000 2 cores/chip Xeon 3.0GHz Core2duo

More information

Compressed Sensing Algorithm for Real-Time Doppler Ultrasound Image Reconstruction

Compressed Sensing Algorithm for Real-Time Doppler Ultrasound Image Reconstruction Mathematical Modelling and Applications 2017; 2(6): 75-80 http://www.sciencepublishinggroup.com/j/mma doi: 10.11648/j.mma.20170206.14 ISSN: 2575-1786 (Print); ISSN: 2575-1794 (Online) Compressed Sensing

More information

Image Restoration Using DNN

Image Restoration Using DNN Image Restoration Using DNN Hila Levi & Eran Amar Images were taken from: http://people.tuebingen.mpg.de/burger/neural_denoising/ Agenda Domain Expertise vs. End-to-End optimization Image Denoising and

More information

INCOHERENT DICTIONARY LEARNING FOR SPARSE REPRESENTATION BASED IMAGE DENOISING

INCOHERENT DICTIONARY LEARNING FOR SPARSE REPRESENTATION BASED IMAGE DENOISING INCOHERENT DICTIONARY LEARNING FOR SPARSE REPRESENTATION BASED IMAGE DENOISING Jin Wang 1, Jian-Feng Cai 2, Yunhui Shi 1 and Baocai Yin 1 1 Beijing Key Laboratory of Multimedia and Intelligent Software

More information

Weighted-CS for reconstruction of highly under-sampled dynamic MRI sequences

Weighted-CS for reconstruction of highly under-sampled dynamic MRI sequences Weighted- for reconstruction of highly under-sampled dynamic MRI sequences Dornoosh Zonoobi and Ashraf A. Kassim Dept. Electrical and Computer Engineering National University of Singapore, Singapore E-mail:

More information

Sparse & Redundant Representation Modeling of Images: Theory and Applications

Sparse & Redundant Representation Modeling of Images: Theory and Applications Sparse & Redundant Representation Modeling of Images: Theory and Applications Michael Elad The Computer Science Department The Technion Haifa 3, Israel The research leading to these results has been received

More information

QUATERNION-BASED SPARSE REPRESENTATION OF COLOR IMAGE. Licheng Yu, Yi Xu, Hongteng Xu, Hao Zhang

QUATERNION-BASED SPARSE REPRESENTATION OF COLOR IMAGE. Licheng Yu, Yi Xu, Hongteng Xu, Hao Zhang QUATERNION-BASED SPARSE REPRESENTATION OF COLOR IMAGE Licheng Yu, Yi Xu, Hongteng Xu, Hao Zhang Department of Electronic Engineering, Shanghai Jiaotong University, Shanghai 200240, China Shanghai Key Lab

More information

Parallel Implementation of Sparse Coding and Dictionary Learning on GPU

Parallel Implementation of Sparse Coding and Dictionary Learning on GPU Final Report Parallel Implementation of Sparse Coding and Dictionary Learning on GPU Huynh Manh Parallel Distributed System, CSCI 7551 Fall 2016 1. Introduction While the goal of sparse coding is to find

More information

GPU PARALLEL IMPLEMENTATION OF THE APPROXIMATE K-SVD ALGORITHM USING OPENCL. Paul Irofti, Bogdan Dumitrescu

GPU PARALLEL IMPLEMENTATION OF THE APPROXIMATE K-SVD ALGORITHM USING OPENCL. Paul Irofti, Bogdan Dumitrescu GPU PARALLEL IMPLEMENTATION OF THE APPROXIMATE K-SVD ALGORITHM USING OPENCL Paul Irofti, Bogdan Dumitrescu Department of Automatic Control and Computers University Politehnica of Bucharest 313 Spl. Independenţei,

More information

Computer Architecture

Computer Architecture Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,

More information

Cell Programming Tips & Techniques

Cell Programming Tips & Techniques Cell Programming Tips & Techniques Course Code: L3T2H1-58 Cell Ecosystem Solutions Enablement 1 Class Objectives Things you will learn Key programming techniques to exploit cell hardware organization and

More information

Introducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method

Introducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method Introducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method G. Wellein, T. Zeiser, G. Hager HPC Services Regional Computing Center A. Nitsure, K. Iglberger, U. Rüde Chair for System

More information

Signal and Image Recovery from Random Linear Measurements in Compressive Sampling

Signal and Image Recovery from Random Linear Measurements in Compressive Sampling Signal and Image Recovery from Random Linear Measurements in Compressive Sampling Sarah Mazari and Kamel Belloulata Abstract In many applications, including audio or digital imaging, the Nyquist rate is

More information

HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES. Cliff Woolley, NVIDIA

HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES. Cliff Woolley, NVIDIA HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES Cliff Woolley, NVIDIA PREFACE This talk presents a case study of extracting parallelism in the UMT2013 benchmark for 3D unstructured-mesh

More information

Implementation of a backprojection algorithm on CELL

Implementation of a backprojection algorithm on CELL Implementation of a backprojection algorithm on CELL Mario Koerner March 17, 2006 1 Introduction X-ray imaging is one of the most important imaging technologies in medical applications. It allows to look

More information

Trainlets: Dictionary Learning in High Dimensions

Trainlets: Dictionary Learning in High Dimensions 1 Trainlets: Dictionary Learning in High Dimensions Jeremias Sulam, Student Member, IEEE, Boaz Ophir, Michael Zibulevsky, and Michael Elad, Fellow, IEEE arxiv:102.00212v4 [cs.cv] 12 May 201 Abstract Sparse

More information

Dictionary Learning from Incomplete Data for Efficient Image Restoration

Dictionary Learning from Incomplete Data for Efficient Image Restoration Dictionary Learning from Incomplete Data for Efficient Image Restoration Valeriya Naumova Section for Computing and Software, Simula Research Laboratory, Martin Linges 25, Fornebu, Norway Email: valeriya@simula.no

More information

Multicore Challenge in Vector Pascal. P Cockshott, Y Gdura

Multicore Challenge in Vector Pascal. P Cockshott, Y Gdura Multicore Challenge in Vector Pascal P Cockshott, Y Gdura N-body Problem Part 1 (Performance on Intel Nehalem ) Introduction Data Structures (1D and 2D layouts) Performance of single thread code Performance

More information

Introduction to Topics in Machine Learning

Introduction to Topics in Machine Learning Introduction to Topics in Machine Learning Namrata Vaswani Department of Electrical and Computer Engineering Iowa State University Namrata Vaswani 1/ 27 Compressed Sensing / Sparse Recovery: Given y :=

More information

Performance Optimization of a Massively Parallel Phase-Field Method Using the HPC Framework walberla

Performance Optimization of a Massively Parallel Phase-Field Method Using the HPC Framework walberla Performance Optimization of a Massively Parallel Phase-Field Method Using the HPC Framework walberla SIAM PP 2016, April 13 th 2016 Martin Bauer, Florian Schornbaum, Christian Godenschwager, Johannes Hötzer,

More information

Accelerating image registration on GPUs

Accelerating image registration on GPUs Accelerating image registration on GPUs Harald Köstler, Sunil Ramgopal Tatavarty SIAM Conference on Imaging Science (IS10) 13.4.2010 Contents Motivation: Image registration with FAIR GPU Programming Combining

More information

Design of Efficient Clustering Dictionary for Sparse Representation of General Image Super Resolution

Design of Efficient Clustering Dictionary for Sparse Representation of General Image Super Resolution APSIPA ASC 2011 Xi an Design of Efficient Clustering Dictionary for Sparse Representation of General Image Super Resolution Seno Purnomo, Supavadee Aramvith, and Suree Pumrin Department of Electrical Engineering

More information

SHAPE-INCLUDED LABEL-CONSISTENT DISCRIMINATIVE DICTIONARY LEARNING: AN APPROACH TO DETECT AND SEGMENT MULTI-CLASS OBJECTS IN IMAGES

SHAPE-INCLUDED LABEL-CONSISTENT DISCRIMINATIVE DICTIONARY LEARNING: AN APPROACH TO DETECT AND SEGMENT MULTI-CLASS OBJECTS IN IMAGES SHAPE-INCLUDED LABEL-CONSISTENT DISCRIMINATIVE DICTIONARY LEARNING: AN APPROACH TO DETECT AND SEGMENT MULTI-CLASS OBJECTS IN IMAGES Mahdi Marsousi, Xingyu Li, Konstantinos N. Plataniotis Multimedia Lab,

More information

LEARNING OVERCOMPLETE SPARSIFYING TRANSFORMS WITH BLOCK COSPARSITY

LEARNING OVERCOMPLETE SPARSIFYING TRANSFORMS WITH BLOCK COSPARSITY LEARNING OVERCOMPLETE SPARSIFYING TRANSFORMS WITH BLOCK COSPARSITY Bihan Wen, Saiprasad Ravishankar, and Yoram Bresler Department of Electrical and Computer Engineering and the Coordinated Science Laboratory,

More information

Collaborative Sparsity and Compressive MRI

Collaborative Sparsity and Compressive MRI Modeling and Computation Seminar February 14, 2013 Table of Contents 1 T2 Estimation 2 Undersampling in MRI 3 Compressed Sensing 4 Model-Based Approach 5 From L1 to L0 6 Spatially Adaptive Sparsity MRI

More information

arxiv: v1 [cs.cv] 7 Jul 2017

arxiv: v1 [cs.cv] 7 Jul 2017 A MULTI-LAYER IMAGE REPRESENTATION USING REGULARIZED RESIDUAL QUANTIZATION: APPLICATION TO COMPRESSION AND DENOISING Sohrab Ferdowsi, Slava Voloshynovskiy, Dimche Kostadinov Department of Computer Science,

More information

IMAGE DENOISING USING NL-MEANS VIA SMOOTH PATCH ORDERING

IMAGE DENOISING USING NL-MEANS VIA SMOOTH PATCH ORDERING IMAGE DENOISING USING NL-MEANS VIA SMOOTH PATCH ORDERING Idan Ram, Michael Elad and Israel Cohen Department of Electrical Engineering Department of Computer Science Technion - Israel Institute of Technology

More information

Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design

Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant

More information

TERM PAPER ON The Compressive Sensing Based on Biorthogonal Wavelet Basis

TERM PAPER ON The Compressive Sensing Based on Biorthogonal Wavelet Basis TERM PAPER ON The Compressive Sensing Based on Biorthogonal Wavelet Basis Submitted By: Amrita Mishra 11104163 Manoj C 11104059 Under the Guidance of Dr. Sumana Gupta Professor Department of Electrical

More information