Parallel iterative cone beam CT image reconstruction on a PC cluster

Similar documents
A numerical simulator in VC++ on PC for iterative image reconstruction

Assessment of OSEM & FBP Reconstruction Techniques in Single Photon Emission Computed Tomography Using SPECT Phantom as Applied on Bone Scintigraphy

Unmatched Projector/Backprojector Pairs in an Iterative Reconstruction Algorithm

A Fast Implementation of the Incremental Backprojection Algorithms for Parallel Beam Geometries5

Feldkamp-type image reconstruction from equiangular data

Workshop on Quantitative SPECT and PET Brain Studies January, 2013 PUCRS, Porto Alegre, Brasil Corrections in SPECT and PET

A Weighted Least Squares PET Image Reconstruction Method Using Iterative Coordinate Descent Algorithms

A Projection Access Scheme for Iterative Reconstruction Based on the Golden Section

SINGLE-PHOTON emission computed tomography

Axial block coordinate descent (ABCD) algorithm for X-ray CT image reconstruction

A Parallel Implementation of the Katsevich Algorithm for 3-D CT Image Reconstruction

A Fast GPU-Based Approach to Branchless Distance-Driven Projection and Back-Projection in Cone Beam CT

GPU implementation for rapid iterative image reconstruction algorithm

Accelerated C-arm Reconstruction by Out-of-Projection Prediction

Iterative and analytical reconstruction algorithms for varying-focal-length cone-beam

Parallelism of iterative CT reconstruction based on local reconstruction algorithm

SPECT reconstruction

3/27/2012 WHY SPECT / CT? SPECT / CT Basic Principles. Advantages of SPECT. Advantages of CT. Dr John C. Dickson, Principal Physicist UCLH

Temperature Distribution Measurement Based on ML-EM Method Using Enclosed Acoustic CT System

Slab-by-Slab Blurring Model for Geometric Point Response Correction and Attenuation Correction Using Iterative Reconstruction Algorithms

A Comparison of the Uniformity Requirements for SPECT Image Reconstruction Using FBP and OSEM Techniques

Gengsheng Lawrence Zeng. Medical Image Reconstruction. A Conceptual Tutorial

Reconstruction from Projections

Tomographic Reconstruction

Spiral ASSR Std p = 1.0. Spiral EPBP Std. 256 slices (0/300) Kachelrieß et al., Med. Phys. 31(6): , 2004

An object-oriented library for 3D PET reconstruction using parallel computing

DEVELOPMENT OF CONE BEAM TOMOGRAPHIC RECONSTRUCTION SOFTWARE MODULE

Acknowledgments and financial disclosure

USING cone-beam geometry with pinhole collimation,

Determination of Three-Dimensional Voxel Sensitivity for Two- and Three-Headed Coincidence Imaging

RECENTLY, biomedical imaging applications of

Index. aliasing artifacts and noise in CT images, 200 measurement of projection data, nondiffracting

S rect distortions in single photon emission computed

Medical Image Analysis

CT NOISE POWER SPECTRUM FOR FILTERED BACKPROJECTION AND ITERATIVE RECONSTRUCTION

The Near Future in Cardiac CT Image Reconstruction

Advanced Scatter Correction for Quantitative Cardiac SPECT

high performance medical reconstruction using stream programming paradigms

Expectation Maximization and Total Variation Based Model for Computed Tomography Reconstruction from Undersampled Data

An approximate cone beam reconstruction algorithm for gantry-tilted CT

3-D Monte Carlo-based Scatter Compensation in Quantitative I-131 SPECT Reconstruction

High Resolution Iterative CT Reconstruction using Graphics Hardware

Review of PET Physics. Timothy Turkington, Ph.D. Radiology and Medical Physics Duke University Durham, North Carolina, USA

Validation of GEANT4 for Accurate Modeling of 111 In SPECT Acquisition

Monte-Carlo-Based Scatter Correction for Quantitative SPECT Reconstruction

Optimization of Cone Beam CT Reconstruction Algorithm Based on CUDA

Rapid Emission Tomography Reconstruction

Multi-slice CT Image Reconstruction Jiang Hsieh, Ph.D.

Abstract I. INTRODUCTION

(RMSE). Reconstructions showed that modeling the incremental blur improved the resolution of the attenuation map and quantitative accuracy.

Impact of X-ray Scatter When Using CT-based Attenuation Correction in PET: A Monte Carlo Investigation

Artifact Mitigation in High Energy CT via Monte Carlo Simulation

Adaptive algebraic reconstruction technique

FASART: An iterative reconstruction algorithm with inter-iteration adaptive NAD filter

664 IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 52, NO. 3, JUNE 2005

Adaptive region of interest method for analytical micro-ct reconstruction

AN ELLIPTICAL ORBIT BACKPROJECTION FILTERING ALGORITHM FOR SPECT

Interior Reconstruction Using the Truncated Hilbert Transform via Singular Value Decomposition

GPU-based Fast Cone Beam CT Reconstruction from Undersampled and Noisy Projection Data via Total Variation

Joint ICTP-TWAS Workshop on Portable X-ray Analytical Instruments for Cultural Heritage. 29 April - 3 May, 2013

SPECT (single photon emission computed tomography)

Splitting-Based Statistical X-Ray CT Image Reconstruction with Blind Gain Correction

Spectral analysis of non-stationary CT noise

Consistency in Tomographic Reconstruction by Iterative Methods

Enhancement Image Quality of CT Using Single Slice Spiral Technique

Noise weighting with an exponent for transmission CT

Beam Attenuation Grid Based Scatter Correction Algorithm for. Cone Beam Volume CT

Computed Tomography. Principles, Design, Artifacts, and Recent Advances. Jiang Hsieh THIRD EDITION. SPIE PRESS Bellingham, Washington USA

IN single photo emission computed tomography (SPECT)

CLASS HOURS: 4 CREDIT HOURS: 4 LABORATORY HOURS: 0

Méthodes d imagerie pour les écoulements et le CND

Constructing System Matrices for SPECT Simulations and Reconstructions

A Backprojection-Filtration Algorithm for Nonstandard. Spiral Cone-beam CT with an N-PI Window

Cover Page. The handle holds various files of this Leiden University dissertation

Algebraic Iterative Methods for Computed Tomography

Top-level Design and Pilot Analysis of Low-end CT Scanners Based on Linear Scanning for Developing Countries

NIH Public Access Author Manuscript Med Phys. Author manuscript; available in PMC 2009 March 13.

Non-Homogeneous Updates for the Iterative Coordinate Descent Algorithm

Iterative SPECT reconstruction with 3D detector response

Implementation of a backprojection algorithm on CELL

The Design and Implementation of COSEM, an Iterative Algorithm for Fully 3-D Listmode Data

Central Slice Theorem

NIH Public Access Author Manuscript Int J Imaging Syst Technol. Author manuscript; available in PMC 2010 September 1.

THE FAN-BEAM scan for rapid data acquisition has

Q.Clear. Steve Ross, Ph.D.

Multilevel Optimization for Multi-Modal X-ray Data Analysis

On the Efficiency of Iterative Ordered Subset Reconstruction Algorithms for Acceleration on GPUs

On the Use of Graphics Hardware to Accelerate Algebraic Reconstruction Methods

Reduction of Metal Artifacts in Computed Tomographies for the Planning and Simulation of Radiation Therapy

Introduction to Positron Emission Tomography

Advanced Image Reconstruction Methods for Photoacoustic Tomography

An Iterative Approach to the Beam Hardening Correction in Cone Beam CT (Proceedings)

An Efficient Technique For Multi-Phase Model Based Iterative Reconstruction

Non-Stationary CT Image Noise Spectrum Analysis

Generalized Filtered Backprojection for Digital Breast Tomosynthesis Reconstruction

EM+TV Based Reconstruction for Cone-Beam CT with Reduced Radiation

MEDICAL IMAGE ANALYSIS

Tomographic Algorithm for Industrial Plasmas

Improvement of Efficiency and Flexibility in Multi-slice Helical CT

Modeling and Incorporation of System Response Functions in 3D Whole Body PET

Transcription:

Galley Proof 30/04/2005; 12:41 File: xst127.tex; BOKCTP/ljl p. 1 Journal of X-Ray Science and Technology 13 (2005) 1 10 1 IOS Press Parallel iterative cone beam CT image reconstruction on a PC cluster Xiang Li a, Jun Ni c,d and Ge Wang a,b a Department of Biomedical Engineering, The University of Iowa, Iowa City, IA 52242, USA b Department of Radiology, The University of Iowa, Iowa City, IA 52242, USA c Department of Computer Science, The University of Iowa, Iowa City, IA 52242, USA d Research Services, ITS, The University of Iowa, Iowa City, IA 52242, USA E-mail: {xiang-li, jun-ni, ge-wang}@uiowa.edu Received 30 June 2004 Revised 30 July 2004 Accepted 7 August 2004 Abstract. The iterative reconstruction (IR) algorithms are able to generate CT images with higher quality compared with the filtered backprojection method when the projection data is noisy or the radiation dose is low. IR can also be used when the data is incomplete. The major disadvantage of the IR is its high demand on computation and slow reconstruction. To improve the time performance of the IR we parallelized four representative iterative algorithms: EM, SART and their ordered subset (OS) versions on a Linux PC cluster. A micro-forward-back-projector was implemented for increased parallelization. Parameters are cached at the ray level during forward-projection to further reduce time used for back-projection. The speed-up and efficiency factors of our parallel implementation are reported. 1. Introduction The iterative reconstruction (IR) technique is a branch of methods other than the filtered back-projection (FBP) for the computed tomography (CT). We use IR to include algorithms from both the statistical reconstruction (SR) and the algebraic reconstruction technique (ART) because they all compute the final image through a loop of steps. There are many IR algorithms available. Representative among them are the maximum likelihood (ML) expectation maximization (EM) [1 3] and the simultaneous algebraic reconstruction technique (SART) [4 6]. With ML-EM the image is obtained iteratively as an optimal estimate that maximizes the likelihood of the detection of the actual measured photons of a statistical modeling of the imaging system. The EM method can also be deterministically interpreted as the process of minimizing the I-divergence error between the estimated and measured projection data in nonnegative space [7]. With SART the algorithm repeatedly minimizes the mean square error between the estimated and measured projections in real space. Both EM and SART, together with other IR algorithms, are superior to the FBP method in terms of better image quality (contrast and resolution) under noisy projections [8], which is especially prominent in the positron emission tomography (PET) and single photon emission computed tomography (SPECT) where the noise level in the projection is high. IR is capable of modeling the PET and SPECT systems more precisely by compensation for effects of attenuation, scatter and distance dependent detector response (DDDR) [9 13]. IR can also be 0895-3996/04/$17.00 2005 IOS Press and the authors. All rights reserved

Galley Proof 30/04/2005; 12:41 File: xst127.tex; BOKCTP/ljl p. 2 2 X. Li et al. / Parallel iterative cone beam CT image reconstruction on a PC cluster used with incomplete projection data and for metal artifacts deduction where FBP fails [14]. Modern commercial PET and SPECT scanners have begun to include IR as a standard algorithm in the software package [15]. However, the major disadvantage of the IR is its high demand on computation. For a large set of data from 3D volume scan it might take several hundreds hours for IR to run on a single computer, preventing it from real time clinical applications. The reason is that besides updating the image a single iteration of the IR needs one forward-projection and one back-projection whereas the FBP only needs one back-projection. As an example for EM often at least 30 iterations are needed. One approach to improving the time performance of the IR is to accelerate the algorithm by applying the ordered-subset (OS) concepts [16,17] to the projection data during iteration in which the projections are divided into smaller-sized groups then the IR algorithm is applied to every subset in sequence. For example, the OS-EM has reduced the reconstruction time by an order of magnitude than the EM [16]. Other efforts to speed up the IR algorithm have focused on optimizing the forward and back projection schemes, including the application of the texture rendering hardware to increase the projection calculation speed [18,19]. Another approach has been focused on the parallelization of the computation. As a matter of fact, parallel computing can also quicken the back-projection part of the FBP method. Due to the inherent nature of data parallelization in IR algorithms, parallel implementation applies to all IR algorithms. Various parallelization schemes have been proposed and they can be categorized into two types in terms of the hardware used in the system. The first type utilized a centralized parallel system such as VLSI architecture computers, vector computers, large-scale parallel computers, and shared memory multiprocessor computers [20 26]. The second type of parallel computing systems employed a network of general-purpose computers and the cluster of PCs was connected by a fast local area network (LAN). As the computing technology advances dramatically many recent parallel implementations were based on the second type [27 30]. Clusters have been built on either WinNT or Unix/Linux platform and could be a hybrid of SIMD computers and MIMD system. For example [29], and [31] used a cluster of computers each with four processors. Carefully designed load-balancing and inter-process communication algorithms have also been used to optimize the parallel system s performance [32]. With the development of the distributed computing technology, client-server and peer-to-peer network models have been used to decentralize the image reconstruction system [27,29,33]. Most of these works parallelized EM and OS-EM algorithms for PET data. In this paper we parallelized four representative IR algorithms: EM, OS-EM, SART, and OS-SART on a Linux PC cluster. The projection data were generated from a simulation of the spiral cone-beam X-ray CT scan. A micro-forward-back-projector was implemented for increased parallelization. Parameters are cached at the ray level during forward-projection to further reduce time used for back-projection. In next section we briefly reviewed IR algorithms and introduced the parallelization scheme. The performance of our parallel implementation was reported in the third section. We concluded the paper with discussion in the last section. 2. Methods 2.1. Review of EM and SART algorithms and their ordered-subset versions Compared with the analytic method it is easy to adapt the IR algorithms to various scan geometries and EM, OS-EM, and SART for cone-beam geometry have been reported. In our simulation of the cone-beam X-ray scan, the volume is sampled with a cube of voxels of equal size. (Figure 1) Each voxel

Galley Proof 30/04/2005; 12:41 File: xst127.tex; BOKCTP/ljl p. 3 X. Li et al. / Parallel iterative cone beam CT image reconstruction on a PC cluster 3 y x z (a) (b) Fig. 1. The cone-beam scan geometry. (a) The volume is sampled with a cube of voxels. (b) The spiral X-ray source locus. has a constant attenuation factor x j,j =1,...,N, where N is the total number of voxels. Estimated projection ˆb i,i=1,...,m is the line integral along the i-th ray path and b i,i=1,...,m is the measured projection, where M is the total number of rays. Let x and b represent the corresponding N-dimensional and M-dimensional column vector respectively; A =(a ij ) be the matrix mapping x to b, where a ij measures the contribution of x j to b i. We have the following linear system: Ax = b (1) The CT image reconstruction problem is to find x. The EM algorithm is formulated as follows: M a ij (b i /ˆb i ) x (k+1) j = x (k) i=1 j,j =1,...,N. (2) M a ij i=1 The SART formula is expressed as: M (b i a ˆb i ) ij i=1 N a ij x (k+1) j = x (k) j=1 j + M,j =1,...,N. (3) a ij i=1 In the ordered-subset (OS) version of IR the projection data are divided into smaller sets S t,t = 1, 2,...,L, where L is the total number of subsets. The IR algorithms are applied to each individual

Galley Proof 30/04/2005; 12:41 File: xst127.tex; BOKCTP/ljl p. 4 4 X. Li et al. / Parallel iterative cone beam CT image reconstruction on a PC cluster Updated Image forward-project Data Measurement Compare Measurement Measurement Error (Weighted) Back-project Correction Measurement Fig. 2. Use of ordered subsets for iterative reconstruction. For each_iteration Do For each_projection Do forward-projection get-comparison back-projection update-image (a) For each_iteration Do For each_subset Do For each_projection Do forward-projection get-comparison back-projection update-image (b) Fig. 3. The pseudo-code for IR (a) and its OS version (b). subset. The intermediate reconstructed image is used as the input to the next subset reconstruction. A single iteration is defined as a loop of calculations that pass through all subsets. Figure 2 gives a schematic illustration of the ordered- subset IR. The formulation of OS-EM and OS-SART can be found in [16,34]. In this paper the ordering of subsets is simply sequential according to the original projection view s order. 2.2. Parallel implementation of the IR algorithm In this paper we parallelized four representative IR algorithms: EM, OS-EM, SART, and OS-SART. Other IR algorithms can be parallelized in the same manner. These algorithms can be described by the pseudo-code as Fig. 3. The most time-consuming part of IR is the forward-projection and back-projection. The computation involved in steps of forward-projection, get-comparison, and back-projection is in a piecewise manner for all projection data. As soon as one piece of the projection data is calculated after ray-tracing along a specific path, the data can be compared with the corresponding measured projection and then be

Galley Proof 30/04/2005; 12:41 File: xst127.tex; BOKCTP/ljl p. 5 X. Li et al. / Parallel iterative cone beam CT image reconstruction on a PC cluster 5 Table 1 The time needed in seconds to reconstruct images in Fig. 5 by sequential IR algorithms and their parallel counterparts EM P OS-EM P SART P OS-SART P 32 iterations 2 iterations 16 subsets 32 iterations 2 iterations 16 subsets np = 1 1,420.28 120.79 1,428.18 121.11 np = 2 728.43 100.52 731.50 101.15 np = 4 379.89 98.67 381.96 99.13 np = 8 224.38 108.24 226.14 108.83 np = 16 162.70 140.66 163.20 140.45 Sequential 2,584 192 2,590 192 For each_iteration Do For each_projection_assigned_to_worker_node Do micro-forward-back-projection update-image-on-master-node send-updated-image-to-all-worker-nodes (a) For each_iteration Do For each_subset Do For each_projection_assigned_to_worker_node Do micro-forward-back-projection update-image-on-master-node send-updated-image-to-all-worker-nodes (b) Fig. 4. The pseudo-code for parallel IR (a) and its OS version (b). back-projected. We took an object-oriented view of the abstract definition of a ray which allowed each ray to take care of its three actions (forward-projection, get-comparison, and back-projection) and three pieces of data (estimated projection, measured projection, and the comparison between the two). The ray was implemented as a micro-forward-back-projector and its implementation facilitated parallelization of the IR algorithm. Because the back-projection is along the same ray-path and using the same interpolation, we cached six parameters (three for ray path direction three for interpolation) during the forward-projection step. There are six direction parameters and eight interpolation weight factors in total. We find that more aggressive caching does not reduce computation time further because the cost for saving and retrieving more parameters overruns the time saved by caching. We divided projection data into groups and assigned each group to a process on the worker node. Figure 4 The worker node carried out computation of the micro-forward-back-projection. The master node gathered temporary images after back-projection and combined them to generate the updated image and then send it back to all worker nodes for next iteration of reconstruction. For OS version of IR algorithms projection data within each subset was divided and assigned among worker nodes. We used message passing model and the MPI protocol [35] for communication between master node and worker nodes. 2.3. Numerical simulation A 3D Shepp-Logan phantom bounded by a 2.0 3 volume of arbitrary unit was used as the object. The phantom was sampled by 128 3 voxels of equal size. 96 projection views were measured along a spiral

Galley Proof 30/04/2005; 12:41 File: xst127.tex; BOKCTP/ljl p. 6 6 X. Li et al. / Parallel iterative cone beam CT image reconstruction on a PC cluster Node 3 Switch 2 Client Node 1 Switch 1 Node 7 Node 2 Node 8 Switch 3 Node 12 Fig. 5. The PC cluster topology. with one turn and a total translation of 1.0 along the axial direction. The source-origin distance was 5.0. A planar detector with 128 64 cells was used which defined a cone-angle of 20 and a fan angle of 30. Tri-linear interpolation was used in ray-tracing to obtain the value at sampling points from their eight neighboring voxels along the ray path. These neighboring points and associated interpolation weights were cached for a single ray so that during back-projection these parameters would not be computed again. All data were stored as 4-byte floating-point type. The PC cluster has 14 nodes running Linux operating system with dual 1G Hz Pentium III processors and 1 GB memory. The processor cache size is 32 KB L1 and 256 KB L2. They are connected with gigabit/s switches. Figure 5 illustrates the topology of the cluster. The MPI implementation MPICH 1.2.5 [36,37] is installed as the message-passing interface among nodes. The sequential and parallel C code was compiled with gcc and mpicc, respectively. 3. Performance tests results Figure 6 shows the reconstruction results from four parallel IR algorithms. The image is one slice of the 3D Shepp-Logan phantom at z = 0.25 where all ellipsoids intersect with the cross-section. Table 1 gives the time needed in seconds to reconstruct images in Fig. 6 by sequential IR algorithms and their parallel counterpart. Timing for sequential IR algorithms was taken using system clock. Timing for parallel IR algorithms was taken using MPI built-in function MPI Wtime(), which is often referred to as the wall time. The MPI function MPI Barrier() was called when timing started and ended to synchronize all processes. We can see that reconstruction of images with same visual appearance in parallel is a magnitude faster than in sequence for both EM and SART. For OS-EM and OS-SART the parallel implementation is approximately two times faster than their sequential counterparts. Note that the parallel reconstruction using only 1 process is still faster than the sequential reconstruction with a speed-up gain of 1.81. The reason is that the MPICH C compiler mpicc has internal optimization for multi-processor nodes. Tables 2 and 3 give the speed up and efficacy factors for each parallel algorithm. The speed-up is defined as: Time parallel (np =1) Speed up (np) = (3.1) Time Parallel (np) The efficiency is defined as: Speed up (np) Efficiency = (3.2) np

Galley Proof 30/04/2005; 12:41 File: xst127.tex; BOKCTP/ljl p. 7 X. Li et al. / Parallel iterative cone beam CT image reconstruction on a PC cluster 7 Table 2 The speed-up factors for each parallel IR algorithms EM P OS-EM P SART P OS-SART P np = 1 1 1 1 1 np = 2 1.95 1.20 1.95 1.20 np = 4 3.74 1.22 3.74 1.22 np = 8 6.33 1.12 6.32 1.11 np = 16 8.73 0.859 8.75 0.862 Table 3 The efficiency factors for each parallel IR algorithms EM P OS-EM P SART P OS-SART P np = 1 1 1 1 1 np = 2 0.975 0.60 0.975 0.60 np = 4 0.935 0.305 0.935 0.305 np = 8 0.791 0.140 0.790 0.139 np = 16 0.546 0.0538 0.547 0.0538 (a) (b) (c) (d) Fig. 6. The reconstructed image slice at position z = 0.25. (a) EM 32 iterations. (b) OS-EM 2 iterations 16 subsets. (c) SART 32 iterations. (d) OS-SART 2 iterations 16 subsets. where np is the number of processes. The optimal linear speed-up is obtained if Efficiency = 1. If Efficiency < 1 np the parallel implementation actually exhibits slow down. Please note that the EM P and SART P have almost identical speed up and efficiency performance. The same thing happens to OS-EM P and OS-SART P. Therefore in Fig. 7 only the speed-up and efficiency for EM P and OS-EM P are drawn. From the figure we can see that the greatest speed-up is achieved for EM and OS-EM using 16 processes and 4 processes, respectively. The sub-linear speed up for EM P after nodes 8 is caused by the extra communication when passing from node 3 to node 4 via switch 2. (See Fig. 5) Due to the large load of data communication resulted from ordered subset technique, the parallel OS-EM that partitions data only in the projection domain is very inefficient in terms of processes used. Optimized data partition and optimization schemes are needed to improve the efficiency and to reconstruct large volume. 4. Discussion With the ever-increasing computing power from the commodity desktop computers a PC cluster becomes a practical tool for both research and application purposes. In this paper we parallelized four representative IR algorithms: EM, OS-EM, SART, and OS-SART with spiral cone-beam scan simulation on a Linux PC cluster to demonstrate this capability. For ordered subset versions of EM and SART we used the simple sequential-order grouping of projection data. To achieve the same image reconstruction

Galley Proof 30/04/2005; 12:41 File: xst127.tex; BOKCTP/ljl p. 8 8 X. Li et al. / Parallel iterative cone beam CT image reconstruction on a PC cluster Fig. 7. Top-left and right: the speed-up factors for EM P and OS-EM P, respectively. Bottom-left and right: the efficiency factors for EM P and OS-EM P, respectively quality the parallel EM and SART take less than three minutes while the normal sequential reconstructions takes more than 47 minutes. The speed-up factor is not so prominent, however, for the OS-EM and OS-SART. The reason is that traditional ordered subsets technique is inherently sequential within one iteration as every subset of projections is used in turn. A direct data-parallel partition scheme is very inefficient in this manner due to the large data integration and dispatching overhead occurred at the beginning and the end of each subset iteration. This suggests a future work of the redesign of a parallel algorithm for the ordered subsets. Acknowledgements We appreciate helpful suggestions from Dr. Shiying Zhao with the University of Iowa Radiology Department. We would like to thank Mr. James Hunsaker for the University of Iowa ITS cluster server management. This work is supported in part by an NIH grant (R01 DC03590).

Galley Proof 30/04/2005; 12:41 File: xst127.tex; BOKCTP/ljl p. 9 References X. Li et al. / Parallel iterative cone beam CT image reconstruction on a PC cluster 9 [1] J. Rockmore and A. Macovski, A maximum likelihood approach to image reconstruction, Proc Joint Automatic Control Conf, 1977, pp. 782 786. [2] L.A. Shepp and Y. Valdi, Maximum likelihood reconstruction for emission tomography, IEEE, Trans. Med. Imag. MI-1 (1982), 113 122. [3] K. Lange and R. Carson, EM reconstruction algorithms for emission and transmission tomography, J. Comput. Assist. Tomog. 8(2), (April, 1984), 302 316, [4] A.H. Andersen, Algebraic reconstruction in CT from limited views, IEEE Trans. Med. Imag. 8 (1989), 50 55. [5] A.H. Andersen and A.C. Kak, Simultaneous algebraic reconstruction technique (SART): A superior implementation of the ART algorithm, Ultrasonic Imaging 6 (1984), 81 94. [6] M. Jiang and G. Wang, Convergence of the simultaneous algebraic reconstruction technique (SART), IEEE Trans. Image Processing 12 (2003), 957 961. [7] D.L. Synder et al., Deblurring subject to nonnegativity constraints, IEEE Trans. Signal Processing 40 (1992), 1143 1150. [8] B.F. Hutton et al., A clinical perspective of accelerated statistical reconstruction, European Journal of Nuclear Medicine 24(7) (July, 1997). [9] D. Blocket et al., Maximum-likelihood reconstruction with ordered subsets in bone SPECT, J. Nucl. Med. 40(12) (1999), 1978 1984. [10] C.Y. Bai et al., Postinjection single photon transmission tomography with ordered-subset algorithms for whole-body PET imaging, IEEE Trans. Nucl. Sci. 49(1) (2002), 74 81. [11] C. Kamphuis et al., Dual matrix ordered subsets reconstruction for accelerated 3D scatter compensation in single-photon emission tomography, Eur. J. Nucl. Med 25(1) (1998), 8 18. [12] S.H. Manglos et al., Transmission maximum-likelihood reconstruction with ordered subsets for cone-beam CT, Phys. Med. Biol. 40(7) (1995), 1225 1241. [13] M.V. Narayanan et al., Optimization of regularization of attenuation and scatter- corrected Tc-99m cardiac SPECT studies for defect detection using hybrid images, IEEE Trans. Nucl. Med. 48(3) (2001), 785 789. [14] G. Wang et al., Iterative deblurring for CT metal artifact reduction, IEEE Trans. Med. Imag. 15 (1996), 657 664. [15] R. Leahy and C. Byrne, Recent developments in iterative image reconstruction for PET and SPECT, Editorial, IEEE Trans. Med. Imag. 19(4) (2000). [16] H.M. Hudson and R.S. Larkin, Accelerated image reconstruction using ordered subsets of projection data, IEEE Trans. Med. Imag. 13 (1994), 601 609. [17] C. Kamphuis and F.J. Beekman, Accelerated iterative transmission CT reconstruction using an ordered subsets convex algorithm, IEEE Trans. Med. Imaging 17(6) (December, 1998). [18] K. Mueller et al., Fast implementations of algebraic methods for three-dimensional reconstruction from cone-beam data, IEEE Trans. Med. Imag. 18 (1999), 538 548. [19] K. Mueler et al., Rapid 3-D cone-beam reconstruction with the simultaneous algebraic reconstruction technique using 2-D texture mapping hardware, IEEE Trans. Med. Imag. 19 (2000), 1227 1237. [20] W.F. Jones et al., Design of a super fast three-dimensional projection system for positron emission tomography, IEEE Trans. Nucl. Sci. 37 (1990), 800 804. [21] C. Guerrini and G. Spaletta, An image reconstruction algorithm in tomography: A version for the CRAY X-MP vector computer, Computers and Graphics 13 (1989), 367 372. [22] M. Miller and C. Butler, 3-D maximum a posteriori estimation for single photon emission computed tomography on massively-parallel comuters, IEEE Trans. Med. Imag. 12 (1993), 560 565. [23] A. McCarty and M. Miller, Maximum likelihood SPECT in clinical computation times using mesh-connected parallel computers, IEEE Trans, Med. Imag. 10 (1991), 426 436. [24] M. Atkins et al., Use of transputers in a 3-D positron emission tomography, IEEE Trans. Med. Imag. 10(3) (1991), 276 283. [25] C.M. Chen et al., A parallel implementation of 3-D CT image reconstruction on hypercube multiprocessor, IEEE Trans. Nucl. Sci. 37(3) (1990), 1333 1346. [26] C. Johnson and A. Sofer, A data-parallel algorithm for iterative tomographic image reconstruction, http://www. nih.gov/publications. [27] D. Shattuck et al., Internet2-based 3D PET image reconstruction using a PC cluster, Phys. Med. Biol. 47 (2002), 2785 2795. [28] G. Kontaxakis et al., Grid processing techniques for the iterative reconstruction of large clinical positron emission tomography sonogram datasets, Health Grid 2003, http://ww.die.upm.es/im. [29] S. Vollmar et al., Heinzel Cluster: accelerated reconstruction for FORE and OSEM3D, Phys. Med. Biol. 47 (2002), 2651 2658.

Galley Proof 30/04/2005; 12:41 File: xst127.tex; BOKCTP/ljl p. 10 10 X. Li et al. / Parallel iterative cone beam CT image reconstruction on a PC cluster [30] A. Bevilacqua, Evaluation of a fully 3-D BPF method for small animal PET images on MIMD architectures, International Journal of Modern Physics C 10(4) (1999), 1 17. [31] W. Backfrieder et al., Web-based parallel ML EM reconstruction for SPECT on SMP clusters, Proceeding of the International Conference on Mathematics and Engineering Techniques in Medicine and Biological Science, Las Vegas, Nevada, USA, June, 2001. [32] A. Bevilacqua, A dynamic load balancing method on a heterogeneous cluster of workstations, Special Issue: Parallel Computing on Networks of Computers 23(1) (1999), 49 56. [33] X. Li et al., P2P-enhanced distributed computing in medical image reconstruction, Proceeding of the International Conference on PDPTA, Las Vegas, Nevada, USA, June, 2004. http://www.world-academy-of-science.org. [34] X. Li, M. Jiang and G. Wang, A numerical simulator in VC++ on PC for iterative image reconstruction, J. of X-Ray Science and Technology 11 (2003), 61 70. [35] The message passing interface (MPI) standard, http://www-unix.mcs.anl.gov/mpi/. [36] W. Gropp, E. Lusk, N. Doss and A. Skjellum, A high-performance, portable implementation of the message passing interface standard, Parallel Computing 22(6) (1996), 789 828. [37] W.D. Gropp and E. Lusk, User s guide for MPICH, a portable implementation of MPI, ANL-96/6, Mathematics and Computer Science Division, Argonne National Laboratory, 1996.