Research Article Parallel Computing of Patch-Based Nonlocal Operator and Its Application in Compressed Sensing MRI

Similar documents
Structure-adaptive Image Denoising with 3D Collaborative Filtering

Cardiac diffusion tensor imaging based on compressed sensing using joint sparsity and low-rank approximation

G Practical Magnetic Resonance Imaging II Sackler Institute of Biomedical Sciences New York University School of Medicine. Compressed Sensing

Efficient MR Image Reconstruction for Compressed MR Imaging

The Benefit of Tree Sparsity in Accelerated MRI

Total Variation Regularization Method for 3D Rotational Coronary Angiography

MRI reconstruction from partial k-space data by iterative stationary wavelet transform thresholding

29 th NATIONAL RADIO SCIENCE CONFERENCE (NRSC 2012) April 10 12, 2012, Faculty of Engineering/Cairo University, Egypt

Total Variation Regularization Method for 3-D Rotational Coronary Angiography

Compressed Sensing for Rapid MR Imaging

Weighted-CS for reconstruction of highly under-sampled dynamic MRI sequences

Image Interpolation using Collaborative Filtering

NIH Public Access Author Manuscript Med Phys. Author manuscript; available in PMC 2009 March 13.

COMPRESSED SENSING MRI WITH BAYESIAN DICTIONARY LEARNING.

Supplemental Material for Efficient MR Image Reconstruction for Compressed MR Imaging

Compressed Sensing Reconstructions for Dynamic Contrast Enhanced MRI

Compressed Sensing Algorithm for Real-Time Doppler Ultrasound Image Reconstruction

MAGNETIC resonance imaging (MRI) is an important

Sparse sampling in MRI: From basic theory to clinical application. R. Marc Lebel, PhD Department of Electrical Engineering Department of Radiology

Compressed Sensing MRI with Multichannel Data Using Multicore Processors

Iterative CT Reconstruction Using Curvelet-Based Regularization

Image Denoising based on Adaptive BM3D and Singular Value

Improving the quality of compressed sensing MRI that exploits adjacent slice similarity

Compressive Sensing MRI with Wavelet Tree Sparsity

Constrained Reconstruction of Sparse Cardiac MR DTI Data

Redundancy Encoding for Fast Dynamic MR Imaging using Structured Sparsity

Denoising Methods for MRI Imagery

IMAGE DENOISING USING NL-MEANS VIA SMOOTH PATCH ORDERING

An Optimized Pixel-Wise Weighting Approach For Patch-Based Image Denoising

6480(Print), ISSN (Online) Volume 4, Issue 7, November December (2013), IAEME AND TECHNOLOGY (IJARET)

Locally Adaptive Regression Kernels with (many) Applications

Collaborative Sparsity and Compressive MRI

Institute of Cardiovascular Science, UCL Centre for Cardiovascular Imaging, London, United Kingdom, 2

Introduction. Wavelets, Curvelets [4], Surfacelets [5].

arxiv: v1 [cs.cv] 6 Jun 2017

Compressive Sensing Algorithms for Fast and Accurate Imaging

Main Menu. Summary. sampled) f has a sparse representation transform domain S with. in certain. f S x, the relation becomes

EE123 Digital Signal Processing

Advanced phase retrieval: maximum likelihood technique with sparse regularization of phase and amplitude

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

PERFORMANCE ANALYSIS BETWEEN TWO SPARSITY-CONSTRAINED MRI METHODS: HIGHLY CONSTRAINED BACKPROJECTION (HYPR) AND

Compressive Sensing for Multimedia. Communications in Wireless Sensor Networks

MARS: Multiple Atlases Robust Segmentation

Optimal Sampling Geometries for TV-Norm Reconstruction of fmri Data

Incoherent noise suppression with curvelet-domain sparsity Vishal Kumar, EOS-UBC and Felix J. Herrmann, EOS-UBC

Projected Iterative Soft-thresholding Algorithm for Tight Frames in Compressed Sensing Magnetic Resonance Imaging

IMAGE DENOISING TO ESTIMATE THE GRADIENT HISTOGRAM PRESERVATION USING VARIOUS ALGORITHMS

6 credits. BMSC-GA Practical Magnetic Resonance Imaging II

Clustered Compressive Sensing: Application on Medical Imaging

Open Access Research on the Prediction Model of Material Cost Based on Data Mining

SUPPLEMENTARY MATERIAL

MARS: Multiple Atlases Robust Segmentation

Spread Spectrum Using Chirp Modulated RF Pulses for Incoherent Sampling Compressive Sensing MRI

Development of fast imaging techniques in MRI From the principle to the recent development

Automated Aperture GenerationQuantitative Evaluation of Ape

Enhao Gong, PhD Candidate, Electrical Engineering, Stanford University Dr. John Pauly, Professor in Electrical Engineering, Stanford University Dr.

Research Article CT Image Reconstruction from Sparse Projections Using Adaptive TpV Regularization

Inverse problem regularization using adaptive signal models

Nonlocal transform-domain denoising of volumetric data with groupwise adaptive variance estimation

Denoising an Image by Denoising its Components in a Moving Frame

Research Article Real-Time Compressive Sensing MRI Reconstruction Using GPU Computing and Split Bregman Methods

Image Denoising via Group Sparse Eigenvectors of Graph Laplacian

Non-Local Compressive Sampling Recovery

Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging

Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing Environments

Patch-based Directional Redundant Wavelets in Compressed Sensing Parallel MRI. with Radial Sampling Trajectory

Available Online through

Adjacent Zero Communication Parallel Cloud Computing Method and Its System for

MODEL-BASED FREE-BREATHING CARDIAC MRI RECONSTRUCTION USING DEEP LEARNED & STORM PRIORS: MODL-STORM

Image Restoration Using DNN

Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference

Combination of Parallel Imaging and Compressed Sensing for high acceleration factor at 7T

IMAGE RESTORATION VIA EFFICIENT GAUSSIAN MIXTURE MODEL LEARNING

Patch-Based Color Image Denoising using efficient Pixel-Wise Weighting Techniques

REPORT DOCUMENTATION PAGE

Tomographic reconstruction: the challenge of dark information. S. Roux

An Effective Denoising Method for Images Contaminated with Mixed Noise Based on Adaptive Median Filtering and Wavelet Threshold Denoising

Research Article Path Planning Using a Hybrid Evolutionary Algorithm Based on Tree Structure Encoding

Signal Reconstruction from Sparse Representations: An Introdu. Sensing

Single Breath-hold Abdominal T 1 Mapping using 3-D Cartesian Sampling and Spatiotemporally Constrained Reconstruction

Sparse Component Analysis (SCA) in Random-valued and Salt and Pepper Noise Removal

Deconvolution with curvelet-domain sparsity Vishal Kumar, EOS-UBC and Felix J. Herrmann, EOS-UBC

Whole Body MRI Intensity Standardization

4D Computed Tomography Reconstruction from Few-Projection Data via Temporal Non-local Regularization

Exploiting Non-local Low Rank Structure in Image Reconstruction

COORDINATE MEASUREMENTS OF COMPLEX-SHAPE SURFACES

TERM PAPER ON The Compressive Sensing Based on Biorthogonal Wavelet Basis

DENOISING OF COMPUTER TOMOGRAPHY IMAGES USING CURVELET TRANSFORM

arxiv: v1 [cs.cv] 23 Sep 2017

Spatial-temporal Total Variation Regularization (STTVR) for 4D-CT Reconstruction

Computational Photography Denoising

ADAPTIVE LOW RANK AND SPARSE DECOMPOSITION OF VIDEO USING COMPRESSIVE SENSING

RETRACTED ARTICLE. Web-Based Data Mining in System Design and Implementation. Open Access. Jianhu Gong 1* and Jianzhi Gong 2

Open Access Noisy Radial-blurred Images Restoration Using the BM3D Frames

Comparison of Wavelet Based Watermarking Techniques for Various Attacks

Compressive Sensing Based Image Reconstruction using Wavelet Transform

Joint Enhancement and Denoising Method via Sequential Decomposition

Blind Sparse Motion MRI with Linear Subpixel Interpolation

A Functional Connectivity Inspired Approach to Non-Local fmri Analysis

Open Access The Three-dimensional Coding Based on the Cone for XML Under Weaving Multi-documents

Transcription:

Hindawi Publishing Corporation Computational and Mathematical Methods in Medicine Volume 2014, Article ID 257435, 6 pages http://dx.doi.org/10.1155/2014/257435 Research Article Parallel Computing of Patch-Based Nonlocal Operator and Its Application in Compressed Sensing MRI Qiyue Li, 1 Xiaobo Qu, 1 Yunsong Liu, 1 Di Guo, 2 Jing Ye, 1 Zhifang Zhan, 1 and Zhong Chen 1 1 Departments of Communication Engineering and Electronic Science, Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance Research, Xiamen University, Xiamen 361005, China 2 School of Computer and Information Engineering, Xiamen University of Technology, Xiamen 361024, China Correspondence should be addressed to Xiaobo Qu; quxiaobo@xmu.edu.cn and Zhong Chen; chenz@xmu.edu.cn Received 1 April 2014; Accepted 27 April 2014; Published 20 May 2014 Academic Editor: Peng Feng Copyright 2014 Qiyue Li et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Magnetic resonance imaging has been benefited from compressed sensing in improving imaging speed. But the computation time of compressed sensing magnetic resonance imaging (CS-MRI) is relatively long due to its iterative reconstruction process. Recently, a patch-based nonlocal operator (PANO) has been applied in CS-MRI to significantly reduce the reconstruction error by making use of self-similarity in images. But the two major steps in PANO, learning similarities and performing 3D wavelet transform, require extensive computations. In this paper, a parallel architecture based on multicore processors is proposed to accelerate computations of PANO. Simulation results demonstrate that the acceleration factor approaches the number of CPU cores and overall PANO-based CS-MRI reconstruction can be accomplished in several seconds. 1. Introduction The slow imaging speed is one of major concerns in magnetic resonance imaging (MRI). Compressed sensing MRI (CS- MRI) has shown the ability to effectively reduce the imaging time. Numerous researches have been conducted on CS-MRI [1 4] in the past few years. As one key assumption of CS-MRI, MR images are sparse in some transform domains [5, 6]and the sparsity seriously affects the reconstructed images. Unlike using predefined basis, such as wavelets, to represent MRI images, self-similarity of an MR image has been introduced into CS-MRI [7 10] with the help of nonlocal means (NLM) [11] or block matching and 3D filtering [12]. By modeling the similar patches embedded in MR images with a linear representation, a patch-based nonlocal operator (PANO) [13] has beenshowntooutperformtypicaltotalvariation,redundant wavelets, and some other methods. PANO can be viewed as an alternative form of the block matching 3D frames [14]. It explores the similarities of nonlocal image patches and allows integrating conventional transforms such as wavelet, discrete cosine wavelet, or discrete Fourier transform to further exploit the sparsity of grouped similar image patches. However, PANO usually suffers from massive computations because of high overlapping among the patches. Fortunately, learning patch similarities and forward and backward transformations among grouped patches are independent of each other. This allows reducing the reconstruction time by taking the advantage of the technology of parallel computing [15, 16]. In this paper, we design a parallel architecture to accelerate the computations of PANO. There are two main steps in PANO [13], learning similarities and performing 3D wavelet transform on grouped patches. Simulation results demonstrate that the parallel architecture can effectively accelerate the computation speed by making use of multiple CPU cores. This parallel architecture is also widely applicable to the self-similarity-based image reconstruction methods. 2. Methods As mentioned above, there are two major steps in PANO, learning similarities and performing 3D wavelet transform on grouped patches.

2 Computational and Mathematical Methods in Medicine We first review the process of learning similarities. For a specified patch, PANO is to find Q-1 similarpatches(the shorter the Euclidean distance is, the more similar they are) in a search region centered in the specified patch [13]. After the similarity information of a patch is learnt, the Q patches (including the specified patch and Q-1 similar patches) will be stackedintoa3dcubefromthemostsimilaronetotheleast, as is shown in Figure 1. As are created, 3D wavelet transform can be appliedtothem.first,perform2dforwardwavelettransform in X-Y dimension and then do 1D forward wavelet transform in Z dimension. After the forward wavelet transform is done, a soft threshold operator is applied. Following 3D backward wavelet transform, are assembled back to the image. The two major steps in PANO are both patch-based, and each patch can be handled independently. Therefore, parallel computing can be a good solution to accelerate the computations. More and more image processing applications that exhibit a high degree of parallelism are showing great interests in parallel computing [16]. Parallelization can be easily implemented on multicore central processing unit (CPU) with existing application program interfaces (APIs) such as POSIX threads and OpenMP [15]. Meanwhile, many applications are making use of graphic processing unit (GPU) [17], which is good at doing complex computations. Due to the simplicity, flexible interface, and multiplatform supports, we choose OpenMP as our API to develop parallel computing program on CPU, which is the common configuration for personal computers. 2.1.ParallelizationonLearningSimilarities. In order to make full use of a multicore CPU, tasks should be properly generatedandassignedtocpucores[16]. If there are too many tasks, CPU cores will switch frequently to handle the tasks and the cost for switching between CPU cores is expensive. Therefore, generating a task that is based on a single patch is suboptimal. Instead, generating a task that processes a serial of image patches in one column or one row can be better,asisshowninfigure 2. Becausegeneratingtasks according to columns is the same as according to rows, we choose columns as an example in this paper. Computations in one task are serial while each task will be processed in parallel. After one task is done, computations on the patches of the corresponding column will be finished. That is, all the similarity information for the patches in that column is learnt. CPU cores can be effectively utilized when the number of columnsinanimageismuchlargerthanthenumberofcpu cores in a personal computer. 2.2.Parallelizationon3DWaveletTransformonCubes. After similarity information is learnt, can be easily created, as is shown in Figure 1. The3Dwavelettransform oncubessharesthesameparallelarchitectureassection 2.1. One task gets out the similarity information of patches in one column. After cubes are created according to the learnt similarity information, wavelet transforms will be applied on them. Computations on cubes within each column of patches will be serial and tasks will be processed in parallel, as is shown in Figure 3. Q1 Q7 Q6 Q2 Q3 Q5 T Q4 Search region Q=8 Z Y X Q7 Q6 Q5 Q4 Q3 Q2 Q1 T Figure 1: Search Q-1 = 7 similar patches in the search region and stack them with the basic patch T into a 3D cube. Search region Patch Task Image Figure 2: Generating tasks according to image columns. One task corresponds to one column of patches. 3. Results Experiments are conducted in three aspects. First, the effectivenessoftheparallelarchitectureonapersonalcomputer is discussed. Second, the performance on a professional workstation with 100 CPU cores will be explored. At last, a completeprocessinpano-basedcs-mriwillbeaccomplished to see how much the improvement will be in iterative CS-MRI reconstructions. Every experiment is repeated five times, and the reported computation time is the average of them. 3.1. Parallelization Improvement on a Personal Computer. The experiments are conducted on DELL T1700 personal workstation with E3-1225v3 CPU (4 cores, 3.2 GHz). 64-bit Windows 7 is our operating system. MATLAB version is 2013B with MEX compiler from Visual Studio 2012. Patch size is set to be 8 8,searchregionis39 39, sliding

Computational and Mathematical Methods in Medicine 3 Thread 1 Thread 2 2D image Similarity information.. 2D image Thread J-1 Thread J Figure 3: Parallel architecture of. (a) (b) (c) (d) (e) (f) (g) Figure 4: Dataset. (a), (c), (e), (f), and (g) T2 weighted MR brain images, (b) one frame of cardiac MR images, and (d) a water phantom image. step is 1, and the number Q is 8. Sliding step is a moving step to select the next patch. Haar wavelet is chosen as the sparsifying transform due to its efficiency and simplicity. Seven images shown in Figure 4 are tested. Figures 4(a), 4(c), 4(e), 4(f), and4(g) are T2 weighted brain images acquired from a healthy volunteer at a 3T Siemens Trio Tim MRI scanner using the T2 weighted turbo spin echo sequence (TR/TE = 6100/99 ms, 220 220 mm field of view, and 3 mm slice thickness). Figure 4(b) is a cardiac image downloaded from Bio Imaging Signal Processing Lab [18, 19]. Figure 4(d) is a water phantom image acquired at 7T Varian MRI system (Varian, Palo Alto, CA, USA) with the spin echo sequence (TR/TE = 2000/100 ms, 80 80 mm field of view, and 2 mm slice thickness). 3.1.1. Accelerating in Learning Similarities. As is shown in Figure5, computation time for learning similarities of Figure 4(a) is effectively decreased with the increasing number of CPU cores in use. It demonstrates that the parallelization is effective. However, the accelerating factor cannot be exactly the same as the number of CPU cores. One reason is that it needs to create and assign tasks to CPU cores, and this activity requires extra time. Furthermore, tasks may not be finished exactly within the same time. Therefore, it is reasonable that the accelerating factor is between 3 and 4 with 4 CPU cores. 3.1.2. Accelerating in 3D Wavelet Transform on Cubes. As is shown in Figure 6, parallelization on 3D wavelet transform on cubes of Figure 4(a) is also very effective. Besides the cost introduced in Section 3.1.1 for parallelization, it also needs time to create cubes and reassemble them back to the image. Therefore the accelerating factor is smaller than four when there are 4 CPU cores in use. 3.1.3. Test Results on More MRI Images. The remaining images in Figure 4 aretestedandthecomputationtimeis shown in Figure 7. Because Figures 4(b) 4(g) are complex images, the computation time of 3D wavelet transform

4 Computational and Mathematical Methods in Medicine 5.00 4.00 3.00 2.00 4.502 2.314 1.243 5.00 4.00 3.00 2.00 Computation time with 1 core 4.589 4.615 4.677 4.630 4.635 4.632 2.135 2.112 2.313 2.097 2.092 2.118 Fig. 4(b) Fig. 4(c) Fig. 4(d) Fig. 4(e) Fig. 4(f) Fig. 4(g) One core Two cores Four cores Figure 5: Computation time of learning similarities for Figure 4(a) with different number of CPU cores in use. 3.00 (a) Computation time with 2 cores 2.50 2.330 2.359 2.408 2.353 2.348 2.367 1.50 1.120 2.00 1.50 0.50 1.296 1.305 1.247 1.147 1.133 1.139 0.622 Fig. 4(b) Fig. 4(c) Fig. 4(d) Fig. 4(e) Fig. 4(f) Fig. 4(g) 0.50 0.387 (b) One core Two cores Four cores Figure 6: Computation time of for Figure 4(a) with different number of CPU cores in use. on cubes is approximately twice the consuming time for Figure 4(a), as real and imaginary parts are computed separately. The computation time is comparable for different images, implying that the computations do not depend on the imagestructures.furthermore,asitcanbeseeninfigures 7(a), 7(b),and7(c), the acceleration is effective with different images when testing with various CPU cores. 1.40 1.20 0.80 0.60 0.40 0.20 1.219 1.238 1.239 1.241 1.224 Fig. 4(b) 1.272 0.766 0.746 0.757 0.779 0.768 0.766 Fig. 4(c) Computation time with 4 cores Fig. 4(d) (c) Fig. 4(e) Fig. 4(f) Fig. 4(g) 3.2. Parallelization Improvement on a Professional Workstation. The effectiveness on a professional workstation is discussed in this section. Experiments are executed on IBM x3850 with ten E7-8870 CPUs. As shown in Figures 8 and 9, both learning similarities and are effectively accelerated with 100 cores, and 128 threads are tested due to hyperthreading technology [20]. 3.3. Parallelization Improvement on PANO-Based CS-MRI. The full reconstruction of PANO-based CS-MRI is discussed in this section. Experiments conducted with the sliding steps are 8 and 4, respectively, and 40% of data are sampled. With a typical setting, similarity is learnt twice, and 3D wavelet transform on cubes will be called about 160 times. As introducedin [13, Algorithm 2], setting the times of updating Figure 7: Computation time with various cores for (b), (c), (d), (e), (f), and (g) infigure 4. the guide image as 2 will improve results. Therefore similarity is learnt twice. In [13, Algorithm 1], the assembled image needs to be updated many times to get a sufficient large β. Besides, the 3D wavelet transform performed on cubes will be called about 160 times. As is shown in Table 1, both learning similarities and 3D wavelet transform on cubes are accelerated effectively nearly 4 times with sliding step being 4 and 4 CPU cores in use. When sliding step is 8, a full reconstruction can be accomplished in about 3 seconds. Because computations are very fast and

Computational and Mathematical Methods in Medicine 5 Table 1: Computation time of PANO-based CS-MRI (unit: seconds). Sliding step = 8 Sliding step = 4 One core Four cores One core Four cores 0.077 0.039 0.578 0.153 7.910 2.438 31.684 8.915 Other computations 0.531 0.546 0.560 0.575 Total reconstruction time 8.518 3.023 32.822 9.643 Time (ms) 4000 3500 3000 2500 2000 1500 1000 500 0 1 2 4 8 16 32 64 128 Number of threads Figure 8: Computation time of learning similarities on a professional workstation forfigure 4(a). Time (ms) 2500 2000 1500 1000 500 0 1 2 4 8 16 32 64 128 Number of threads Figure 9: Computation time of on a professional workstation for Figure 4(a). the cost for parallelization is relatively expensive, learning similarities are accelerated less than 3 times. 4. Conclusion In this paper, the patch-based nonlocal operator (PANO) has been parallelized to accelerate its two major steps, learning similarities and performing 3D wavelet transform on grouped similar patches. Experiments conducted on both the personal and professional computers have proven the effectiveness and applicability of the parallel architecture. Results demonstrate that a full PANO-based compressed sensing MRI reconstruction can be accomplished in several seconds. The parallel architecture of PANO is also applicable for other image reconstruction problems [21]. In the future, how to maximize the acceleration for computations of PANO on 3D imaging with multicore CPUs and GPUs will be developed. Conflict of Interests The authors declare that there is no conflict of interests regarding the publication of this paper. Acknowledgments The authors would like to thank Dr. Bingwen Zheng and Mr. Qianjing Shen for the valuable discussions. The authors aregratefultodr.michaellustigforsharingsparemri code and Dr. Jong Chul Ye for sharing cardiac data used in Figure 4(b). This work was partially supported by the NNSF of China (61201045, 61302174, and 11375147), Fundamental Research Funds for the Central Universities (2013SH002), Scientific Research Foundation for the Introduction of Talent at Xiamen University of Technology (YKJ12021R), and Open Fund from Key Lab of Digital Signal and Image Processing of Guangdong Province (2013GDDSIPL-07 and 54600321). References [1] N. Cai, S. R. Wang, S.-S. Zhu, and D. Liang, Accelerating dynamic cardiac MR imaging using structured sparse representation, Computational and Mathematical Methods in Medicine, vol. 2013, Article ID 160139, 8 pages, 2013. [2] M.Lustig,D.L.Donoho,J.M.Santos,andJ.M.Pauly, Compressed sensing MRI, IEEE Signal Processing Magazine, vol. 25, no. 2, pp. 72 82, 2008. [3] M. Lustig, D. Donoho, and J. M. Pauly, Sparse MRI: the application of compressed sensing for rapid MR imaging, Magnetic Resonance in Medicine, vol.58,no.6,pp.1182 1195, 2007. [4] M. Chen, D. Mi, P. He, L. Deng, and B. Wei, A CT reconstruction algorithm based on L 1/2 regularization, Computational and Mathematical Methods in Medicine,vol.2014,ArticleID862910, 8pages,2014. [5] D. L. Donoho, Compressed sensing, IEEE Transactions on Information Theory,vol.52,no.4,pp.1289 1306,2006. [6] E. J. Candes, J. Romberg, and T. Tao, Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information, IEEE Transactions on Information Theory,vol.52,no.2,pp.489 509,2006.

6 Computational and Mathematical Methods in Medicine [7] K.Lu,N.He,andL.Li, Nonlocalmeans-baseddenoisingfor medical images, Computational and Mathematical Methods in Medicine,vol.2012,ArticleID438617,7pages,2012. [8] M. Akçakaya,T.A.Basha,B.Godduetal., Low-dimensionalstructure self-learning and thresholding: regularization beyond compressed sensing for MRI Reconstruction, Magnetic Resonance in Medicine,vol.66,no.3,pp.756 767,2011. [9] G. Adluru, T. Tasdizen, M. C. Schabel, and E. V. R. Dibella, Reconstruction of 3D dynamic contrast-enhanced magnetic resonance imaging using nonlocal means, JournalofMagnetic Resonance Imaging,vol.32,no.5,pp.1217 1227,2010. [10] K. Egiazarian, A. Foi, and V. Katkovnik, Compressed sensing image reconstruction via recursive spatially adaptive filtering, in Proceedingsofthe14thIEEEInternationalConferenceon Image Processing (ICIP 07), vol.1,pp.549 552,SanAntonio, Tex, USA, September 2007. [11] A. Buades, B. Coll, and J. M. Morel, A review of image denoising algorithms, with a new one, Multiscale Modeling & Simulation, vol. 4, no. 2, pp. 490 530, 2005. [12] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, Image denoising by sparse 3-D transform-domain collaborative filtering, IEEE Transactions on Image Processing, vol. 16, no. 8, pp. 2080 2095, 2007. [13] X. Qu, Y. Hou, F. Lam, D. Guo, J. Zhong, and Z. Chen, Magnetic resonance image reconstruction from undersampled measurements using a patch-based nonlocal operator, Medical Image Analysis,2013. [14] A. Danielyan, V. Katkovnik, and K. Egiazarian, BM3D frames and variational image deblurring, IEEE Transactions on Image Processing,vol.21,no.4,pp.1715 1728,2012. [15] R. Shams, P. Sadeghi, R. A. Kennedy, and R. I. Hartley, A survey of medical image registration on multicore and the GPU, IEEE Signal Processing Magazine,vol.27,no.2,pp.50 60,2010. [16] D. Kim, V. W. Lee, and Y.-K. Chen, Image processing on multicore x86 architectures, IEEE Signal Processing Magazine, vol.27,no.2,pp.97 107,2010. [17] S. S. Stone, J. P. Haldar, S. C. Tsao, W.-M. W. Hwu, B. P. Sutton, and Z.-P. Liang, Accelerating advanced MRI reconstructions on GPUs, Journal of Parallel and Distributed Computing, vol. 68, no. 10, pp. 1307 1318, 2008. [18] H. Jung, K. Sung, K. S. Nayak, E. Y. Kim, and J. C. Ye, kt FOCUSS: a general compressed sensing framework for high resolution dynamic MRI, Magnetic Resonance in Medicine,vol. 61,no.1,pp.103 116,2009. [19] Fig. 4(b), http://bisp.kaist.ac.kr/ktfocuss.htm. [20] HT Technology, http://en.wikipedia.org/wiki/hyper-threading. [21] D. Guo, X. Qu, X. Du, K. Wu, and X. Chen, Salt and pepper noise removal with noise detection and a patch-based sparse representation, Advances in Multimedia, vol. 2014, ArticleID 682747, 14 pages, 2014.