Algebraic Iterative Methods for Computed Tomography

Similar documents
Algebraic Iterative Methods for Computed Tomography

Column-Action Methods in Image Reconstruction

ART Exhibit. Per Christian Hansen. Tommy Elfving Touraj Nikazad Hans Henrik B. Sørensen. joint work with, among others

Iterative Methods for Solving Linear Problems

Advanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs

Tutorial: Row Action Methods

THE DEVELOPMENT OF THE POTENTIAL AND ACADMIC PROGRAMMES OF WROCLAW UNIVERISTY OF TECH- NOLOGY ITERATIVE LINEAR SOLVERS

Comments on the randomized Kaczmarz method

Constraining strategies for Kaczmarz-like algorithms

The Role of Linear Algebra in Computed Tomography

Reconstruction of Tomographic Images From Limited Projections Using TVcim-p Algorithm

PENALIZED LEAST-SQUARES IMAGE RECONSTRUCTION FOR BOREHOLE TOMOGRAPHY

Regularization in Tomography

Regularization in Tomography

Tomographic Algorithm for Industrial Plasmas

Consistency in Tomographic Reconstruction by Iterative Methods

IN THIS PAPER we consider the solution of ill-posed

Locally Weighted Least Squares Regression for Image Denoising, Reconstruction and Up-sampling

Iterative Algorithms I: Elementary Iterative Methods and the Conjugate Gradient Algorithms

Lecture 2 September 3

DM545 Linear and Integer Programming. Lecture 2. The Simplex Method. Marco Chiarandini

CS231A Course Notes 4: Stereo Systems and Structure from Motion

A Note on the Behavior of the Randomized Kaczmarz Algorithm of Strohmer and Vershynin


PARALLEL OPTIMIZATION

Contents. I Basics 1. Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

Lecture 11: Randomized Least-squares Approximation in Practice. 11 Randomized Least-squares Approximation in Practice

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions

Lab # 2 - ACS I Part I - DATA COMPRESSION in IMAGE PROCESSING using SVD

Cover Page. The handle holds various files of this Leiden University dissertation.

Robust Kernel Methods in Clustering and Dimensionality Reduction Problems

Reconstruction methods for sparse-data tomography

Convex Optimization / Homework 2, due Oct 3

Adaptive Filters Algorithms (Part 2)

COMPARATIVE STUDY OF DIFFERENT APPROACHES FOR EFFICIENT RECTIFICATION UNDER GENERAL MOTION

Linear programming II João Carlos Lourenço

COMPUTATIONAL INTELLIGENCE (CS) (INTRODUCTION TO MACHINE LEARNING) SS16. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions

For example, the system. 22 may be represented by the augmented matrix

TRAVELTIME TOMOGRAPHY (CONT)

Some Advanced Topics in Linear Programming

Optimal threshold selection for tomogram segmentation by reprojection of the reconstructed image

Alternating Projections

MATH 423 Linear Algebra II Lecture 17: Reduced row echelon form (continued). Determinant of a matrix.

The Simplex Algorithm

Lecture 17 Sparse Convex Optimization

Bindel, Spring 2015 Numerical Analysis (CS 4220) Figure 1: A blurred mystery photo taken at the Ithaca SPCA. Proj 2: Where Are My Glasses?

Robot Mapping. Least Squares Approach to SLAM. Cyrill Stachniss

Graphbased. Kalman filter. Particle filter. Three Main SLAM Paradigms. Robot Mapping. Least Squares Approach to SLAM. Least Squares in General

Supplementary Material : Partial Sum Minimization of Singular Values in RPCA for Low-Level Vision

Chapter 18. Geometric Operations

Fast Approximation of Matrix Coherence and Statistical Leverage

Conic Duality. yyye

Signal Reconstruction from Sparse Representations: An Introdu. Sensing

The ART of tomography

Iterative CT Reconstruction Using Curvelet-Based Regularization

Tomographic reconstruction: the challenge of dark information. S. Roux

Lecture 3: Camera Calibration, DLT, SVD

CHAPTER 5 SYSTEMS OF EQUATIONS. x y

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

Math 5593 Linear Programming Lecture Notes

Subspace Clustering with Global Dimension Minimization And Application to Motion Segmentation

Introduction to Digital Image Processing

Recent Developments in Model-based Derivative-free Optimization

arxiv: v1 [cs.na] 16 May 2013

Continuous and Discrete Image Reconstruction

A Projection Access Scheme for Iterative Reconstruction Based on the Golden Section

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Adaptive algebraic reconstruction technique

Practical Least-Squares for Computer Graphics

Factorization with Missing and Noisy Data

Ultrasonic Multi-Skip Tomography for Pipe Inspection

6. Linear Discriminant Functions

Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma

RECOVERY OF PARTIALLY OBSERVED DATA APPEARING IN CLUSTERS. Sunrita Poddar, Mathews Jacob

Simulation of Multipoint Ultrasonic Flowmeter

CSE 547: Machine Learning for Big Data Spring Problem Set 2. Please read the homework submission policies.

Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference

Sequential Coordinate-wise Algorithm for Non-negative Least Squares Problem

SDLS: a Matlab package for solving conic least-squares problems

Chapter 3 Numerical Methods

ICRA 2016 Tutorial on SLAM. Graph-Based SLAM and Sparsity. Cyrill Stachniss

Convex Optimization MLSS 2015

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

Finding Euclidean Distance to a Convex Cone Generated by a Large Number of Discrete Points

Diffusion Wavelets for Natural Image Analysis

ELEG Compressive Sensing and Sparse Signal Representations

A hybrid GMRES and TV-norm based method for image restoration

A Brief Look at Optimization

arxiv: v1 [math.co] 25 Sep 2015

Adapted acquisition trajectory and iterative reconstruction for few-views CT inspection

5. GENERALIZED INVERSE SOLUTIONS

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation

Programming, numerics and optimization

Limited view X-ray CT for dimensional analysis

EE613 Machine Learning for Engineers LINEAR REGRESSION. Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov.

NIH Public Access Author Manuscript Med Phys. Author manuscript; available in PMC 2009 March 13.

Convex Optimization. 2. Convex Sets. Prof. Ying Cui. Department of Electrical Engineering Shanghai Jiao Tong University. SJTU Ying Cui 1 / 33

Mutation-linear algebra and universal geometric cluster algebras

Algorithms for sparse X-ray CT image reconstruction of objects with known contour

Transcription:

Algebraic Iterative Methods for Computed Tomography Per Christian Hansen DTU Compute Department of Applied Mathematics and Computer Science Technical University of Denmark Per Christian Hansen Algebraic Iterative Methods 1 / 51

Plan for Today 1 A bit of motivation. 2 The algebraic formulation. 3 Matrix notation and interpretation. 4 Kaczmarz s method (also known as ART) fully sequential. 5 Cimmino s method and vatiants fully simultaneous. 6 More linear algebra: null space and least squares. 7 The optimization viewpoint. Points to take home today: Linear algebra provides a concise framework for formulating the algorithms. Convergence analysis of iterative algebraic methods: Kaczmarz s method = ART converges for consistent problems only. Cimmino s method always converges. Be careful with the null space. Least squares problems always have a solution. The optimization viewpoint leads to a broad class of iterative methods. Per Christian Hansen Algebraic Iterative Methods 2 / 51

FBP: Filtered Back Projection This is the classical method for 2D reconstructions. There are similar methods for 3D, such as FDK. Many year of use lots of practical experience. The FBP method is very fast (it uses the Fast Fourier Transform)! The FBP method has low memory requirements. With many data, FBP gives very good results. Example with 3% noise: Per Christian Hansen Algebraic Iterative Methods 3 / 51

FBP Versus Algebraic Methods Limited data, or nonuniform distribution of projection angles or rays artifacts appear in FBP reconstructions. Difficult to incorporate constraints (e.g., nonnegativity) in FBP. Algebraic methods are more flexible and adaptive. Same example with 3% noise and projection angles 15, 30,..., 180 : Algebraic Reconstruction Technique, box constraints (pixel values [0,1]). Per Christian Hansen Algebraic Iterative Methods 4 / 51

Another Motivating Example: Missing Data Irregularly spaced angles & missing angles also cause difficulties for FBP: Per Christian Hansen Algebraic Iterative Methods 5 / 51

Setting up the Algebraic Model The damping of the ith X-ray through the object is a line integral of the attenuation coefficient χ(s) along the ray (from Lambert-Beer s law): b i = χ(s) dl, i = 1, 2,..., m. ray i Assume that χ(s) is a constant x j in pixel j. This leads to: b i = a ij x j, a ij = length of ray i in pixel j, j ray i where the sum is over those pixels j that are intersected by ray i. If we define a ij = 0 for those pixels not intersected by ray i, then we have a simple sum n b i = a ij x j, n = number of pixels. j=1 Per Christian Hansen Algebraic Iterative Methods 6 / 51

A Big and Sparse System If we collect all m equations then we arrive at a system of linear equations A x = b with a very sparse system matrix A. Example: 5 5 pixels and 9 rays: A really big advantage is that we only set up equations for the data that we actually have. In case of missing data, e.g., for certain projection angles or certain rays in a projection, we just omit those from the linear system. Per Christian Hansen Algebraic Iterative Methods 7 / 51

The System Matrix is Very Sparse Another example: 256 256 pixels and 180 projections with 362 rays each. The system matrix A is 65, 160 65, 536 and has 4.27 10 9 elements. There are 15, 018, 524 nonzero elements corresponding to a fill of 0.35%. Per Christian Hansen Algebraic Iterative Methods 8 / 51

The Simplest Algebraic Problem One unknown, no noise: Now with noise in the data compute a weighted average: We know from statistics that solution s variance is inversely proportional to the number of data. So more data is better. Let us immediately continue with a 2 2 image... Per Christian Hansen Algebraic Iterative Methods 9 / 51

A Sudoku Problem Four unknowns, four rays system of linear equations A x = b: Unfortunately there are infinitely many solutions, with k R: (There is an arbitrary component in the null space of the matrix A.) Per Christian Hansen Algebraic Iterative Methods 10 / 51

More Data Gives a Unique Solution With enough rays the problem has a unique solution. Here, one more ray is enough to ensure a full-rank matrix: The difficulties associated with the discretized tomography problem are closely linked with properties of the coefficient matrix A: The sensitivity of the solution to the data errors is characterized by the condition number κ = A 2 A 1 2 (not discussed today). The uniqueness of the solution is characterized by the rank of A, the number of linearly independent row or columns. Per Christian Hansen Algebraic Iterative Methods 11 / 51

Algebraic Reconstruction Methods In principle, all we need to do in the algebraic formulation is to solve the large sparse linear system A x = b: Math: x = A 1 b, MATLAB: x = A\b. How hard can that be? Actually, this can be a formidable task if we try do use a traditional approach such as Gaussian elimination. Researchers in tomography have therefore focused on the use of iterative solvers and they have rediscovered many methods developed by mathematicians... In tomography they are called algebraic reconstruction methods. They are much more flexible than FBP, but at a higher computational cost! Per Christian Hansen Algebraic Iterative Methods 12 / 51

Some Algebraic Reconstruction Methods Fully Sequential Methods Kaczmarz s method + variants. These are row-action methods: they update the solution using one row of A at a time. Fast convergence. Fully Simultaneous Methods Landweber, Cimmino, CAV, DROP, SART, SIRT,... These methods use all the rows of A simultaneously in one iteration (i.e., they are based on matrix multiplications). Slower convergence. Krylov subspace methods (rarely used in tomography) CGLS, LSQR, GMRES,... These methods are also based on matrix multiplications. Per Christian Hansen Algebraic Iterative Methods 13 / 51

Matrix Notation and Interpretation Notation: r 1 A = c 1 c 2 c n =., r m The matrix A maps the discretized absorption coefficients (the vector x) to the data in the detector pixels (the elements of the vector b) via: b 1 r 1 x b 2 b =. = A x = x r 2 x 1 c 1 + x 2 c 2 + + x n c }{{ n = }.. linear combination of columns b m r m x The ith row of A maps x to detector element i via the ith ray: n b i = r i x = a ij x j, i = 1, 2,..., m. j=1 Per Christian Hansen Algebraic Iterative Methods 14 / 51

Example of Column Interpretation A 32 32 image has four nonzero pixels with intensities 1, 0.8, 0.6, 0.4. In the vector x these four pixels correspond to entries 468, 618, 206, 793. Hence the sinogram, represented as a vector b, takes the form b = 0.6 c 206 + 1.0 c 468 + 0.8 c 618 + 0.4 c 793. Per Christian Hansen Algebraic Iterative Methods 15 / 51

Forward and Back Again Forward projection. Consider ray i: a ij = length of ray i in pixel j r i = [ a i1 a i2 0 a i4 0 0 a i7 0 0 ] b i = r i x = a i1 x 1 + a i2 x 2 + a i4 x 4 + a i7 x 7 The forward projection b = A x maps all image pixels x j along the ray to the detector data b i. x 1 x 2 x 5 x 3 x 4 x 7 x 6 ray i Back projection. Another name for multiplication with A T : ray i b m m i a i1 b i a i1 b i a i4 b i a i7 A T b = b i r i = b i a i2 i=1 i=1. b i a i2 0 0 Each data b i is distributed back along the ith ray to the pixels, with weights = a ij : b i r i = [ b i a i1 b i a i2 0 b i a i4 0 0 b i a i7 0 0 ] T Per Christian Hansen Algebraic Iterative Methods 16 / 51 0 0 x 8 x 9 0

Geometric Interpretation of A x = b r 1 x = a 11 x 1 + a 12 x 2 + + a 1n x n = b 1 r 2 x = a 21 x 1 + a 22 x 2 + + a 2n x n = b 2. r m x = a m1 x 1 + a m2 x 2 + + a mn x n = b m. Each equation r i x = b i defines an affine hyperplane in R n : x 2 R 2 a i1 x 1 + a i2 x 2 = b i x 1 x 3 R 3 x 2 x 1 a i1 x 1 + a i2 x 2 + a i3 x 3 = b i Per Christian Hansen Algebraic Iterative Methods 17 / 51

Geometric Interpretation of the Solution Assuming that the solution to A x = b is unique, it is the point x R n where all the m affine hyperplanes intersect. Example with m = n = 2: x 2 R 2 a 11 x 1 + a 12 x 2 = b 1 a 21 x 1 + a 22 x 2 = b 2 x 1 Per Christian Hansen Algebraic Iterative Methods 18 / 51

Kaczmarz s Method = Algebraic Reconstruction Technique A simple iterative method based on the geometric interpretation. In each iteration, and in a cyclic fashion, compute the new iteration vector such that precisely one of the equations is satisfied. This is achieved by projecting the current iteration vector x on one of the hyperplanes r i x = b i for i = 1, 2,..., m, 1, 2,..., m, 1, 2,... Originally proposed in 1937, and independently suggested under the name ART by Gordon, Bender & Herman in 1970 for tomographic reconstruction. Per Christian Hansen Algebraic Iterative Methods 19 / 51

Orthogonal Projection on Affine Hyperplane r i H i = {x R n r i x = b i } z Pi (z) O The orthogonal projection P i (z) of an arbitrary point z on the affine hyperplane H i defined by r i x = b i is given by: P i (z) = z + b i r i z r i 2 2 r i, r i 2 2 = r i r i. In words, we scale the row vector r i by (b i r i z)/ r i 2 2 and add it to z. Per Christian Hansen Algebraic Iterative Methods 20 / 51

Kaczmarz s Method We thus obtain the following algebraic formulation: Basic Kaczmarz algorithm x (0) = initial vector for k = 0, 1, 2,... i = k (mod m) end x (k+1) = P i ( x (k) ) = x (k) + b i r i x (k) r i 2 2 Each time we have performed m iterations of this algorithm, we have performed one sweep over the rows of A. Other choices of sweeps: Symmetric Kaczmarz: i = 1, 2,..., m 1, m, m 1,..., 3, 2. Randomized Kaczmarz: select row i randomly, possibly with probability proportional to the row norm r i 2. r i Per Christian Hansen Algebraic Iterative Methods 21 / 51

Convergence Issues The convergence of Kaczmarz s method is quite obvious from the graph on slide 19 but can we say more? Difficulty: the ordering of the rows of A influences the convergence rate: 1.0 1.0 2.0 1.0 1.1 1.0 3.0 x = 2.1 4.0 1.0 3.7 4.7 The ordering 1 3 2 4 is preferable: almost twice as fast. Per Christian Hansen Algebraic Iterative Methods 22 / 51

Convergence of Kaczmarz s Method One way to avoid the difficulty associated with influence of the ordering of the rows is to assume that we select the rows randomly. For simplicity, assume that A is invertible and that all rows of A are scaled to unit 2-norm. Then the expected value E( ) of the error norm satisfies: ( ) ( E x (k) x 2 2 1 1 ) k n κ 2 x (0) x 2 2, k = 1, 2,..., where x = A 1 b and κ = A 2 A 1 2. This is linear convergence. When κ is large we have ( 1 1 ) k n κ 2 1 k n κ 2. After k = n steps, corresp. to one sweep, the reduction factor is 1 1/κ 2. Note that there are often orderings for which the convergence is faster! Per Christian Hansen Algebraic Iterative Methods 23 / 51

Cyclic Convergence So far we have assumed that there is a unique solution x = A 1 b that satisfies A x = b, i.e., all the affine hyperplanes associated with the rows of A intersect in a single point. What happens when this is not true? cyclic convergence: m = 3, n = 2 Kaczmarz s method can be brought to converge to a unique point, and we will discuss the modified algorithm later today. Per Christian Hansen Algebraic Iterative Methods 24 / 51

From Sequential to Simultaneous Updates Karzmarz s method accesses the rows sequentially. Cimmino s method accesses the rows simultaneously and computes the next iteration vector as the average of the all the projections of the previous iteration vector: x (k+1) = 1 m ( P i x (k) ) = 1 m (x (k) + b i r i x (k) ) m m r i=1 i=1 i 2 r i 2 = x (k) + 1 m b i r i x (k) m r i 2 r i. 2 i=1 P H 1 (x (k) ) 2 x (k) x (k+1) H 1 P 2 (x (k) ) Per Christian Hansen Algebraic Iterative Methods 25 / 51

Matrix formulation of Cimmino s Method We can write the updating in our matrix-vector formalism as follows x (k+1) = x (k) + 1 m = x (k) + 1 m m b i r i x (k) i=1 ( r 1 r 1 2 2 r m r i 2 2 T = x (k) + 1 r 1 r 1 2 2 m. r i r m r m 2 2 = x (k) + A T M 1( b A x (k)), ) b 1 r 1 x (k). b m r m x (k) r 1... b. r m 2 2 r m where we introduced the diagonal matrix M = diag ( m r i 2 2). x (k) Per Christian Hansen Algebraic Iterative Methods 26 / 51

Cimmino s Method We thus obtain the following formulation: Basic Cimmino algorithm x (0) = initial vector for k = 0, 1, 2,... x (k+1) = x (k) + A T M 1( b A x (k)) end Note that one iteration here involves all the rows of A, while one iteration in Kaczmarz s method involves a single row. Therefore, the computational work in one Cimmino iteration is equivalent to m iterations (a sweep over all the rows) in Kaczmarz s basic algorithm. The issue of finding a good row ordering is, of course, absent from Cimmino s method. Per Christian Hansen Algebraic Iterative Methods 27 / 51

Convergence Study Assume x (0) = 0 and let I denote the n n identity matrix; then: x (k+1) = = k (I A T M 1 A) j A T M 1 b j=0 ( I (I A T M 1 A) k+1) (A T M 1 A) 1 A T M 1 b. If A is invertible then (A T M 1 A) 1 A T M 1 b = A 1 M A T A T M 1 b = A 1 b. Moreover, the largest eigenvalue of the symmetric matrix I A T M 1 A is strictly smaller than one, and therefore ( I (I A T M 1 A) k+1) I for k. Hence the iterates x (k) converge to the solution x = A 1 b. Per Christian Hansen Algebraic Iterative Methods 28 / 51

Convergence of Cimmino s Method To simplify the result, assume that A is invertible and that the rows of A are scaled such that A 2 2 = m. Then x (k) x 2 2 ( 1 2 ) k 1 + κ 2 x (0) x 2 2 where x = A 1 b, κ = A 2 A 1 2, and we have linear convergence. When κ 1 then we have the approximate upper bound x (k) x 2 2 < (1 2/κ2 ) k x (0) x 2 2, showing that in each iteration the error is reduced by a factor 1 2/κ 2. This is almost the same factor as in one sweep through the rows of A in Kaczmarz s method. Per Christian Hansen Algebraic Iterative Methods 29 / 51

Rectangular and/or Rank Deficient Matrices In tomography, A R m n is almost always a rectangular matrix: m n. It is also very common that A does not have full rank. We need to set the stage for treating such matrices. The rank r of A is the number of linearly independent rows (equal to the number of linearly independent columns), and r min(m, n). The range R(A) is the linear subspace spanned by the columns of A: R(A) {u R m u = α 1 c 1 + α 2 c 2 + + α n c n, arbitrary α j }. (1) The null space N (A) is the linear subspace of all vectors mapped to zero: N (A) {v R n A v = 0}. (2) The dimensions of the two subspaces are r and n r, respectively. Per Christian Hansen Algebraic Iterative Methods 30 / 51

A Small Example Consider the 3 3 matrix 1 2 3 A = 4 5 6. 7 8 9 This matrix has rank r = 2 since the middle row is the average of the first and third rows which are linearly independent. The range R(A) and null space N (A) consist of all vectors of the forms α 1 1 3 α 1 + 3α 2 1 4 + α 2 6 = 4α 1 + 6α 2 and α 3 2, 7 9 7α 1 + 9α 2 1 respectively, for arbitrary α 1, α 2, and α 3. Per Christian Hansen Algebraic Iterative Methods 31 / 51

A Small Example, Continued Consider two linear systems with the matrix A from the previous example: 1 2 3 14 1 2 3 6 4 5 6 x = 20 and 4 5 6 x = 15. 7 8 9 50 7 8 9 24 The left system has no solution because b / R(A); no matter which linear combination of the columns of A we create, we can never form this b. The right system has infinitely many solutions; any vector of the form 1 1 x = 1 + α 2, α arbitrary 1 1 satisfies this equation. The arbitrary component is in the null space N (A). Per Christian Hansen Algebraic Iterative Methods 32 / 51

Null Space Artifacts in Tomography I Image has 16 16 pixels and we use 16 horizontal and 16 vertical X-rays very under-determined system. Left: three different vectors in the null space N (A). Right: the exact ( ground truth ) image x and the reconstruction. One pixel at the intersection of the vertical and horizontal strips has a large and incorrect value, and the values of the horizontal and vertical strips are slightly too low. Per Christian Hansen Algebraic Iterative Methods 33 / 51

Null Space Artifacts in Tomography II The mage has 29 29 pixels and we use projection angles in [50, 130 ] A is 841 841 and rank deficient; the dimension of N (A) is 24. Both reconstructions are imperfect, and the vertical structure of the second test image is almost completely lost in the reconstruction. Per Christian Hansen Algebraic Iterative Methods 34 / 51

Null Space Artifacts in Tomography III Same example the 24 images that span the null space N (A) represent information about missing vertical structures in the reconstruction: This illustrates that intuition and mathematics go hand-in-hand. Per Christian Hansen Algebraic Iterative Methods 35 / 51

Consistent and Inconsistent Systems A system is consistent if there exists at least one x such that A x = b, i.e., such that b is a linear combination of the columns c i of A. This is equivalent to the requirement b R(A). Otherwise the system is inconsistent, b / R(A), as shown below. b c 3 c 2 c 1 R(A) This is actually the normal situation in problems with measurement noise. Per Christian Hansen Algebraic Iterative Methods 36 / 51

Overview of Systems: m n m < n Underdetermined = Full rank r = m Always consistent. Always infinitely many solutions. Rank deficient r < m Can be inconsistent. No solution or infinitely many solutions. m = n Square = r = m = n Always consistent. Always a unique solution. r < m = n Can be inconsistent. No solution or infinitely many solutions. The system is inconsistent when b / R(A). There is a unique solution only if r = m = n. Per Christian Hansen Algebraic Iterative Methods 37 / 51

Overview of Systems: m > n m > n Overdetermined = Full rank r = n Can be inconsistent. No solution or a unique solution. Rank deficient r < n Can be inconsistent. No solution or ininitely many solutions The system is inconsistent when b / R(A). There is a unique solution only if r = n and the system is consistent. Per Christian Hansen Algebraic Iterative Methods 38 / 51

The Least Squares Solution We must define a unique solution for inconsistent systems! Assume that b = A x + e and e is zero-mean Gaussian noise. The best linear unbiased estimate of x is the solution to the least squares problem: x LS = arg min x 1/2 b A x 2 2, and x LS is unique when r = n. Geometrically, this corresponds to finding x LS such that A x LS is orthogonal to the residual vector b A x LS. b R(A) A x LS Per Christian Hansen Algebraic Iterative Methods 39 / 51

Computing the Least Squares Solution The requirement that A x LS (b A x LS ) leads to: ( A x LS ) T ( b A x LS ) = 0 x T LS ( A T b A T A x LS ) = 0 which means that x LS is the solution to the normal equations: A T A x = A T b x LS = (A T A) 1 A T b. x LS exists and is unique when A T A is invertible, which is the case when r = n (i.e., the system is over-determined and A has full rank). Bonus info: the matrix A = (A T A) 1 A T is called the pseudoinverse (or Moore-Penrose inverse) of A. Per Christian Hansen Algebraic Iterative Methods 40 / 51

The Minimum-Norm Least Squares Solution If r < n we can define a unique minimum-norm least squares solution by: x 0 LS = arg min x 2 subject to A T A x = A T b. x Example. Consider again the problem 1 2 3 14 4 5 6 x = 20 with r = 2 < n = 3, 7 8 9 50 x LS is not unique, and all least squares solutions have the form 3 1 x LS = 2 + α 2, α arbitrary. 1 1 The minimum-norm least squares solution x 0 LS is obtained by setting α = 0. Per Christian Hansen Algebraic Iterative Methods 41 / 51

Weighted Least Squares Solutions and Cimmino Recall our definition of the diagonal matrix M = diag ( m r i 2 2). We also define the weighted least squares problem min x 1/2 M 1/2 (A x b) 2 2 (A T M 1 A) x = A T M 1 b and the corresponding solution x LS,M = (A T M 1 A) 1 A T M 1 b. Similarly we define the minimum-norm weighted least squares solution x 0 LS,M = arg min x 2 subject to A T M 1 A x = A T M 1 b. x The full picture of Cimmino s method: r = n = m: convergence to A 1 b. r = n < m and b R(A): convergence to x LS. r = n < m and b / R(A): convergence to x LS,M. r < min(m, n): convergence to x 0 LS,M. Per Christian Hansen Algebraic Iterative Methods 42 / 51

The Optimization Viewpoint Karczmarz, Cimmino and similar algebraic iterative methods are usually considered as solvers for systems of linear equations. But it is more convenient to consider them as optimization methods. Within this framework we can easily handle common extensions: We can introduce a relaxation parameter or step length parameter in the algorithm which controls the size of the updating and, as a consequence, the convergence of the method: a constant ω, or a parameter ω k that changes with the iterations. We can also, in each updating step, incorporate a projection P C on a suitably chosen convex set C that reflects prior knowledge, such as the positive orthant R n + nonnegative solutions, the n-dimensional box [0, 1] n solution elements in [0,1]. We can introduce other norms than the 2-norm 2, which can improve the robustness of the method. Per Christian Hansen Algebraic Iterative Methods 43 / 51

Example: Robust Solutions with the 1-norm The 1-norm is well suited for handling outliers in the data: m min A x b 1, A x b 1 = r i x b i. x Consider two over-determined noisy problems with the same matrix: 1 1 1 6 6.0001 6.0001 1 2 4 17 17.0285 17.2850 A = 1 3 9 1 4 16, A x = 34 57, b = 33.9971 57.0061, b o = 33.9971 57.0061. 1 5 25 86 85.9965 85.9965 1 6 36 121 120.9958 120.9958 Least squares solutions: x LS and x o LS ; 1-norm solutions: x 1 and x o 1 : 1.0041 1.0811 0.9932 0.9932 x LS = 2.0051, x o LS = 2.0151, x 1 = 2.0087, x o 1 = 2.0088. 2.9989 2.9943 2.9986 2.9986 i=1 Per Christian Hansen Algebraic Iterative Methods 44 / 51

The Least Squares Problem Revisited The objective function and its gradient F(x) = 1 /2 b A x 2 2, F(x) = A T (b A x). The method of steepest descent for min x F(x), with starting vector x (0), performs the updates x (k+1) = x (k) ω k F(x) = x (k) + ω k A T (b A x), where ω k is a step-length parameter that can depend on the iteration. Also known as Landweber s method corresponds to M = I in Cimmino. Cimmino s method corresponds to the weighted problem: F M (x) = 1 /2 M 1/2 (b A x) 2 2, F M (x) = A T M 1 (b A x). Per Christian Hansen Algebraic Iterative Methods 45 / 51

Incorporating Simple Constraints We can include constraints on the elements of the reconstructed image. Assume that we can write the constraint as x C, where C is a convex set; this includes two very common special cases: Non-negativity constraints. The set C = R n + corresponds to x i 0, i = 1, 2,..., n. Box constraints. The set C = [0, 1] n (n-dimensional box) corresponds to 0 x i 1, i = 1, 2,..., n. Per Christian Hansen Algebraic Iterative Methods 46 / 51

The Projected Algorithms Both algorithms below solve min x C M 1/2 (b A x) 2. Projected gradient algorithm (ω k < 2/ A T MA 2 ) x (0) = initial vector for k = 0, 1, 2,... x (k+1) = P C ( x (k) + ω k A T M 1 (b A x (k) ) ) end Projected incremental gradient (Kaczmarz) algorithm (ω k < 2) x (0) = initial vector for k = 0, 1, 2,... i = k (mod m) ( ) x (k+1) = P C x (k) b i r i x + ω k r i 2 r i 2 end Per Christian Hansen Algebraic Iterative Methods 47 / 51

Iteration-Dependent Relaxation Parameter ω k The basic Kaczmarz algorithm gives a cyclic and non-convergent behavior. Consider the example from slide 24 with: ω k = 0.8 (independent of k) and ω k = 1/ k, k = 0, 1, 2,... The rightmost plot is a zoom of the middle plot. With a fixed ω k < 1 we still have a cyclic non-convergent behavior. With the diminishing relaxation parameter ω k = 1/ k 0 as k the iterates converge to the weighted least squares solution x LS,M. Per Christian Hansen Algebraic Iterative Methods 48 / 51

Overview of Convergence (see also slides 37 38) What the unconstr. methods converge to, with starting vector x (0) = 0: Kac: Kaczmarz s method with a fixed relaxation parameter. K d: Kaczmarz s method with a diminishing parameter. Cim: Cimmino s method with a fixed relaxation parameter. r < min(m, n) r = min(m, n) b R(A) b / R(A) b R(A) b / R(A) m < n x 0 LS = x 0 LS,M m = n x 0 LS = x 0 LS,M A 1 b Kac: cyclic Kac: cyclic m > n x 0 LS = x 0 LS,M K d: x 0 LS,M x LS = x LS,M K d: x LS,M Cim: x 0 LS,M Cim: x LS,M NB: for over-det. systems with b / R(A), x 0 LS,M x 0 LS and x LS,M x LS. Per Christian Hansen Algebraic Iterative Methods 49 / 51

Last Slide: Other Projected Gradient Algorithms Many algorithm proposed in the literature (Landweber, CAV, DROP, SART, SIRT,... ) are special cases of the following general formulation. General projected gradient algorithm (λ k < 2) x (0) = initial vector for k = 0, 1, 2,... x (k+1) = P C ( x (k) + ω k D 1 A T M 1 (b A x (k) ) ) end Of particular interest is the method SIRT in which: D = diag ( c j 1 ), cj 1 = m i=1 a ij, M = diag ( r i 1 ), r i 1 = n j=1 a ij SIRT is implemented in the ASTRA software, to be discussed tomorrow. Per Christian Hansen Algebraic Iterative Methods 50 / 51

A Few References T. Elfving, T. Nikazad, and C. Popa, A class of iterative methods: semi-convergence, stopping ryles, inconsistency, and constraining; in Y. Censor, Ming Jiang, and Ge Wang (Eds.), Biomedical Mathematics: Promising Directions in Imaging, Therapy Planning, and Inverse Problems, Medical Physics Publishing, Madison, Wisconsin, 2010. P. C. Hansen and M. Saxild-Hansen, AIR Tools A MATLAB package of algebraic iterative reconstruction methods, J. Comp. Appl. Math., 236 (2012), pp. 2167 2178. G. T. Herman, Fundamentals of Computerized Tomography Image Reconstruction from Projections, Springer, New York, 2009. M. Jiang and G. Wang, Convergence studies on iterative algorithms for image reconstruction, IEEE Trans. Medical Imaging, 22 (2003), pp. 569 579. A. C. Kak and M. Slaney, Principles of Computerized Tomographic Imaging, IEEE Press, 1988. Per Christian Hansen Algebraic Iterative Methods 51 / 51