Fast Algorithms for Regularized Minimum Norm Solutions to Inverse Problems
|
|
- Arthur Gilmore
- 6 years ago
- Views:
Transcription
1 Fast Algorithms for Regularized Minimum Norm Solutions to Inverse Problems Irina F. Gorodnitsky Cognitive Sciences Dept. University of California, San Diego La Jolla, CA Dmitry Beransky Elect. and Computer Engineering Dept. University of California, San Diego La Jolla, CA Abstract The computational cost of solving biomedical inverse problems is extremely high. As a result, expensive high end computational platforms are required for processing and at times a trade-off must be made between accuracy and cost of computation. In this paper we present two fast computational algorithms for solving regularized inverse problems. The computational advantages are obtained by utilizing the extreme discrepancy between the dimension of the solution space and the measured data sets. The algorithms implement two common regularization procedures, Tikhonov regularization and Truncated Singular Value Decomposition (TSVD). The algorithms do not compromise numerical accuracy of the solutions. Comparisons of costs of the conventional and proposed algorithms are given. Although the algorithms are presented in the context of biomedical inverse problems, they are applicable to any inverse problem with similar characteristics, such as geophysical inverse problems and non-destructive evaluation. Solving biomedical inverse problems requires operating on large matrices. The computational cost of these problems is extremely high. This increases the cost of the technology, as expensive high end computational platforms are required, and the quality of the solutions may also be compromised at times for the sake of manageable computational costs. We measure the computational cost in terms of the time it takes to compute a solution. There are a number of other ways in which computational cost may be measured. In numerical analysis the number of floating point operations (flops) is commonly used. The processing time measure that we use here, however, is the most accurate indicator of the resources used in a computation and is most relevant to the end user. We report on a study of the cost of inverse solutions for ill-conditioned problems and present two fast algorithms that improve the computational cost compared to the exist- This work was supported by the ONR grant no. N ing methods. The work here is developed in the context of biomedical inverse problems, but other physical inverse problems, most notably in geophysics and non-destructive evaluation, share the same mathematical characteristics and the results of this paper apply directly to those problems. The two characteristic properties of physical inverse problems are ) their severely ill-posed nature, with a great discrepancy between the dimensions of the solution space and the data ( 4 or more), and 2) severe ill-conditioning of the forward model. Solutions to ill-posed problems are non-unique and we focus our attention on computation of the most commonly used solutions, the minimum 2-norm based solutions. Ill-conditioned inverse problems require regularization to prevent the solutions from being excessively sensitive to noise in the data. While efficient algorithms exist for computing inverses, the role of regularization in increasing the cost of the computations has not been well considered. The two regularization techniques that are most widely employed are Tikhonov regularization and Truncated Singular Value Decomposition (TSVD). In its standard implementation, Tikhonov regularization requires over an order of magnitude more operations than the computation of the unregularized solution. Another factor that is commonly not considered in numerical analysis is the cost of memory access. This cost becomes significant when data sets are large, as the cost of accessing a number from a disk exceeds the cost of a floating point operation by 5. Memory access requirements depend on the size of the processor memory and can generate significant cost overhead for mid-range platforms. Here we develop efficient algorithms for the two common regularization techniques. For Tikhonov regularization we derive a new algorithm, termed the Efficient Tikhonov Regularization (ETR) algorithm, which reduces the number of floating point operations by approximately an order of magnitude. For TSVD, we propose modifications of the existing method that significantly reduce the memory access requirements. In both cases the new algorithms provide a
2 significant reduction in the cost of regularized inverse computations. While standard implementations of Tikhonov regularization are higher in cost than TSVD, the new ETR algorithm is shown to be more efficient than TSVD in terms of flop count. Background. The inverse problem Linear inverse problems can be stated mathematically as the estimation of a signal p from its noisy linear transformation : where p is an linear transformation matrix on p, and the vector n represents a set of measurements consisting of the exact values of the transform p, and the additive noise n. In most biomedical inverse problems is severely underdetermined, i.e., being 2 ) and being 5 7. We will refer to such matrices as very wide matrices. The large computational cost arises from operating on such matrices and those are the problems we will consider here. Examples of such problems include bio-electromagnetic imaging of the heart (MCG) and of brain function (EEG and MEG). When, () has no unique solution. We focus our attention here on finding minimum 2-norm (or seminorm) based solutions to () as these are by far the most widely used. These solutions are found by satisfying the constraint min p x The vector x is a reference model that is used in some cases, but more often it is set to zero. is often the identity matrix. It can also represent a derivative operator, such as a Laplacian, in which case is either a square banded matrix or a general matrix of weights which bias the solution [3]. In physical tomography is typically square. To allow a uniform treatment of the constraints represented by (2), it is convenient to transform the above problem into the standard form. For square this transformation is straightforward: "! #, $, x # p x &%, where '!! indicates the Moore-Penrose pseudo-inverse. We use, in place of a regular inverse to accommodate possible rank deficient. The problem in a standard form then is ()+*-,/. $ x ;:<:6= min >? p is recovered by p x x. all ACB&A norms will indicate D 2-norms unless stated otherwise 2 3 The exact solution to (3) is x $! 4 where $! $ FE $G$ E IH. Unfortunately, this is not an acceptable solution when $ is ill-conditioned, that is when the condition number of $ is large. Ill-conditioned $ are to be expected in biomedical inverse problems. In this case, even a small amount of noise in the data can produce arbitrarily large variations in the solution x, rendering these solutions useless. To avoid this problem, the process of regularization must be used, where the original $ is replaced by a slightly different well-conditioned matrix $KJ&LNM. Regularization aims to provide some kind of optimal trade-off between the error in x due to changes in $ and the error due to noise. In the next section we discuss regularization techniques and their costs, but before we start we review the main factors that contribute to the cost of numerical computations..2 Workload and cost of computation The basic units of computer instructions are integer operations. In numerical analysis it is standard to evaluate the complexity and the cost of algorithms in terms of the number of flops. This cost is assumed to be representative of the computational time for the algorithm. Although flops do not reveal the number of integer operations it takes to complete a given instruction, it is nevertheless the accepted and perhaps the best readily obtainable analytic measure of algorithmic complexity. Flop count, however, becomes a poor indicator of the execution time when computations involve large data sets. Other factors, most importantly disk memory access, can affect the execution time of an algorithm far more than an increase in the number of flops, and thus must be taken into account. Because the memory access is not often considered in numerical algorithm analysis and it is rarely described in such literature, we provide a brief overview this process. The variables used in a program are stored in cache and working memory, and when these two storage devices are full, on disk. Access to cache is the fastest, but its hardware is expensive, so its size is very small. Access to working memory is also fast. The size of this memory is indicated by the RAM of a given system. The portion of the data which does not fit in cache and working memory, is put on the disk. The retrieval of data from the disc is very expensive. When a required number is not found in the working memory, program execution is suspended and the kernel starts an I/O operation from the disk. The retrieval request is registered and put into a queue while the disk I/O processor is executing other instructions. Once the retrieval from the disk is in process, a chunk of storage that is called a page and that contains the requested
3 E E number is retrieved and substituted for some other page in the working memory. Only after the processor accesses the desired number does the execution resume. The whole process is called a swap and involves 3 6 integer operations, compared to 2 operations for a flop. The speed of a swap is measured in milliseconds, while the speed of a flop is measured in nanoseconds. Vectors or matrices that exceed certain dimension cannot be stored entirely in the working memory and must be split between this memory and the disk. Operations on such vectors involve a large number of swaps, which creates a bottleneck in the speed of processing. Programs can be designed to minimize the number of swaps by maximizing access of adjacent array entries in the algorithm. Nevertheless, swaps cannot be avoided in certain matrix operations. One of the more expensive operations in terms of swaps is the matrix transpose, because the source and the destination of an entry in the transposed matrix are likely to reside in different storage areas. In our development of fast algorithms we consider two factors, the memory access requirements and the number of flops. We control memory access requirements by minimizing the number of very wide matrix transposes in the algorithms. In simulations, the number of matrix transposes correlates well with the total disk access requirements in regularization algorithms. 2 Regularization Solutions can be regularized in a number of ways, but only two techniques are predominantly used. We described these methods next. Tikhonov regularization [2] is the most common method. The objective min > x is modified to include a misfit parameter, leading to the regularized equation min > $ x 2 2 x 2 is the regularization parameter, and before (5) can be solved the value of must be chosen. The solution depends critically on this choice and only a small range of values produces a good approximation to the true x of the noiseless case, if such an approximation can be found at all. Finding the optimal is not simple. The existing methods for this can be subdivided into two groups. One group of methods assumes that n is known or can be estimated. This assumption often cannot be fulfilled and the approach has other pitfalls that can be deduced from the analysis in [4]. We therefore consider only the second approach to finding, where no knowledge of n is assumed. All the methods in this group can be equated to finding the corner of an -curve, which is the plot of the norm versus the residual $, for various values of. 5 The optimal value for occurs at the sharp -shaped corner of this plot. To find this corner, (5) must be solved repeatedly, typically more than times, for different values of. Thus a direct implementation of Tikhonov regularization increases the cost of finding a solution by over an order of magnitude. The solution to (5) can be found by direct differentiation which leads to the normal equations $ E $ 2 $ E 3 These equations are solved by first performing Cholesky factorization, E $ E $" 2, and then by solving the triangular systems y $ E and E x y. The problem with this approach is that the accuracy of the Cholesky factorization is proportional to the condition number of $ E $, which is the square of the condition number of $. When $ is ill-conditioned, this can lead to a significant decrease in numerical stability of the solution [4]. Thus the use of the normal equations to solve (5) is not optimal. Instead, factorization $ E can be factored directly. The QR $ E 7 is most frequently used. Then, where is the top section of and is upper triangular. can also be found simply in two steps, by solving the upper triangular E y and x y. A slightly more efficient method than, but one that is rarely used, is bidiagonalization by means of left and right orthogonal transformations. in this case is the solution to the resulting sparse system. Note that these factorizations require a transpose of the wide matrix $. Two algorithms for computing the factorization, the Householder and the Modified Gramm-Schmidt (MGS) algorithm, are used. Householder QR has about twice the number of flops than MGS, but it is more stable numerically. Because the matrix!-$ #" is full rank and well conditioned for the appropriate, no column pivoting is needed in factorization and the cheaper MGS algorithm is also acceptable here. Note, however, that MGS will not produce an orthogonal if! $$ " is poorly conditioned. The cost of regularization via both factorization methods is shown in Table I. Truncated Singular Value Decomposition 6 is the second method used for regularization. Here the ill-conditioned $ is replaced by a well-conditioned rank % approximation $&, which is closest to $ in the 2-norm sense. This approximation is given by an SVD expansion of $ truncated to the first % components $'& )( & *,+ u * * v*.-/&#&2 E &43 8
4 * %%2% : 2 matrices -/& and &43 E are composed of the % left and right singular vectors. The subscripts & and & 3 indicate the number of columns and rows, respectively, in these matrices. The % th order TSVD solution then is given by where u i and v i are the left and right singular vectors and are the singular values. The & & H & - & E 3 The regularization parameter in this method is %, which determines the number of singular subspaces of $ used in computing x &. Various methods for selecting % can all be ultimately interpreted as satisfying the -curve criterion, as in the case of Tikhonov regularization. In the case of TSVD, however, the different % th order decompositions are found from a single SVD. Hence TSVD regularization does not significantly add to the cost of unregularized solution to (). By far the most efficient algorithm for computing the SVD of matrices with is R-SVD []. The algorithm first performs the factorization of $ E $" where is upper triangular. is bidiagonalized by where - and - E are orthogonal matrices and is upper bidiagonal. Define - - H. Then the equivalent bidiagonalization of $ is - E $ ; The SVD is then computed from the bidiagonal using the Golub and Kahan algorithm []. The % th order solution is given by 3 & & H & - E&43 When, the QR of $ E must be taken to preserve the small computational cost of R-SVD. The solution to this system is the transpose of (3) &K -/&# H & & E 3 4 E need not be explicitly formed, only applied to as it develops. A TSVD solution via R-SVD factorization requires 2 3 flops and one wide matrix transform (Table I). The majority of the cost of SVD when is contained in the operation and the transformation of $. We can see that although SVD is more expensive than a factorization, TSVD is a cheaper method of regularization than Tikhonov overall, because only a single factorization of $ is required Efficient regularization algorithms Here we describe two efficient regularization algorithms for Tikhonov regularization and TSVD for very wide matrices. Both algorithms are based on factorization, which we describe next. 3. LQ factorization The factorization provides orthogonalization of matrices, ( ) and is defined as $# 2 # where is an lower triangular matrix and is the top section of an orthogonal matrix. is equivalent to the factorization of $ E, i.e. $ E E E 6 The Householder reflections and Gives rotations can be used to compute of the factorization analogously to the factorization. Thus the algorithms for computation have the same numerical and work load properties as the corresponding factorization algorithms. The advantage of is that it can be applied directly to $ when, avoiding the expensive transpose of the matrix. 3.2 Efficient Tikhonov Regularization (ETR) Here we describe a novel fast algorithm for Tikhonov regularization. Since a regularized solution seeks to fit the data within some residual interval, we can rewrite the regularized problem as 7 s= 7 $ r where r is the residual r $ x. Let r as s 5 s, then the regularized problem can be written ;: :6= $ s 8 The standard approach to this problem is to solve it for a range of by performing repeated factorizations of, as described above. Instead, in ETR we perform a single factorization of $ $" Then factorization of a small 2 system # 9 2
5 & is done repeatedly for different values of. The solution for each is given by t s E z E t E z Note that we only need to find z, not /, for each value system and is of. z is the solution to the small? 2 cheap to compute. It can be used to find the corner of an -curve and the optimal. The final solution then can be computed for the optimal only. The total cost of factorizations in ETR is shown in Table I. The factorization of $ dominates the flop count for the algorithm 2. As we can see, ETR produces over an order of magnitude reduction in flop count compared to the standard implementations of Tikhonov regularization. 3.3 Economy TSVD Although taking a transpose of a very large matrix cannot be avoided entirely in TSVD, a significant savings can be achieved by using factorization of $ instead of factorization of $ E. We call SVD via factorization the L-SVD algorithm. With this factorization, the solution is & & E E H - &43 E 22 The matrix & E is %, where typically % K. The saving in taking a transpose of this matrix vs. a transpose of $ is that there are % less rows to transpose. In simulations using 32 RAM platforms and matrices $ with 4, we observed an order of magnitude cost saving using the L-SVD algorithm compared to R-SVD. 3.4 Comparison of the two novel factorizations 2 Thus we cannot make a general statement about relative cost of the two algorithms, but we can observe a tradeoff between the increases in RAM and in the ratio and the cost advantage of TSVD. It is up to the user to determine when the break point in the trade-off occurs for his/her computational platform. 3.5 Mixed compiler implementation Taking transposes of large matrices can also be avoided entirely by mixing Fortran and C subroutines in one program. Because Fortran uses column major storage, i.e. it stores matrices column by column, while C is row major, a matrix stored by one compiler is interpreted naturally as a transpose by the other compiler. Thus invoking C subroutines from Fortran and vice versa for operations requiring a matrix transpose avoids taking this transpose explicitly. In this case, ETR becomes the method of choice for regularization. References [] T. F. Chan. An improved algorithm for computing the singular value decomposition. ACM Trans. Math. Soft., 8:72 83, 982. [2] M. Foster. An application of the wiener-kolmogorov smoothing theory to matrix inversion. J. SIAM, 9: , 96. [3] I. F. Gorodnitsky, J. S. George, H. A. Schlitt, and P. S. Lewis. A weighted iterative algorithm for neuromagmetic imaging. Proc. IEEE Satellite Symposium on Neuroscience and Technology, Lyon, France:6 64, Nov [4] I. F. Gorodnitsky and B. D. Rao. Analysis of Regularization Error in Tikhonov Regularization and Truncated Singular Value Decomposition Methods. Proc. 28th Asilomar Conf. on Signals, Systems and Computers, :25 9, Oct.-Nov The cost of the factorizations for the two novel regularization methods is shown in Table I. We can see that even the ETR algorithm is slightly cheaper than in terms of the flop counts. For larger ratios of the flop count for ETR is significantly less than for TSVD. The cost of a wide matrix transpose is hard to evaluate because so much depends on the details of implementation and the computing platform. Table I Tikhonov Householder QR < Regularization Method Factorization Matrix Size Cost (flops) Wide Matrix Transposes complete ( < ) Regularization MGS QR < complete ( < ) Efficient Tikhonov Householder LQ and complete ( < ) (ETR) and MGS R-SVD QR complete ( ) L-SVD LQ partial (% ) Modified Gramm-Schmidt
CALCULATING RANKS, NULL SPACES AND PSEUDOINVERSE SOLUTIONS FOR SPARSE MATRICES USING SPQR
CALCULATING RANKS, NULL SPACES AND PSEUDOINVERSE SOLUTIONS FOR SPARSE MATRICES USING SPQR Leslie Foster Department of Mathematics, San Jose State University October 28, 2009, SIAM LA 09 DEPARTMENT OF MATHEMATICS,
More informationOutline. Parallel Algorithms for Linear Algebra. Number of Processors and Problem Size. Speedup and Efficiency
1 2 Parallel Algorithms for Linear Algebra Richard P. Brent Computer Sciences Laboratory Australian National University Outline Basic concepts Parallel architectures Practical design issues Programming
More informationA GPU-based Approximate SVD Algorithm Blake Foster, Sridhar Mahadevan, Rui Wang
A GPU-based Approximate SVD Algorithm Blake Foster, Sridhar Mahadevan, Rui Wang University of Massachusetts Amherst Introduction Singular Value Decomposition (SVD) A: m n matrix (m n) U, V: orthogonal
More information1. Techniques for ill-posed least squares problems. We write throughout this lecture = 2. Consider the following least squares problems:
ILL-POSED LEAST SQUARES PROBLEMS. Techniques for ill-posed least squares problems. We write throughout this lecture = 2. Consider the following least squares problems: where A = min b Ax, min b A ε x,
More informationTELCOM2125: Network Science and Analysis
School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2015 2 Part 4: Dividing Networks into Clusters The problem l Graph partitioning
More informationLAPACK. Linear Algebra PACKage. Janice Giudice David Knezevic 1
LAPACK Linear Algebra PACKage 1 Janice Giudice David Knezevic 1 Motivating Question Recalling from last week... Level 1 BLAS: vectors ops Level 2 BLAS: matrix-vectors ops 2 2 O( n ) flops on O( n ) data
More informationP257 Transform-domain Sparsity Regularization in Reconstruction of Channelized Facies
P257 Transform-domain Sparsity Regularization in Reconstruction of Channelized Facies. azemi* (University of Alberta) & H.R. Siahkoohi (University of Tehran) SUMMARY Petrophysical reservoir properties,
More informationEfficient Implementation of the K-SVD Algorithm and the Batch-OMP Method
Efficient Implementation of the K-SVD Algorithm and the Batch-OMP Method Ron Rubinstein, Michael Zibulevsky and Michael Elad Abstract The K-SVD algorithm is a highly effective method of training overcomplete
More informationTotal variation tomographic inversion via the Alternating Direction Method of Multipliers
Total variation tomographic inversion via the Alternating Direction Method of Multipliers Landon Safron and Mauricio D. Sacchi Department of Physics, University of Alberta Summary Geophysical inverse problems
More information26257 Nonlinear Inverse Modeling of Magnetic Anomalies due to Thin Sheets and Cylinders Using Occam s Method
26257 Nonlinear Inverse Modeling of Anomalies due to Thin Sheets and Cylinders Using Occam s Method R. Ghanati* (University of Tehran, Insitute of Geophysics), H.A. Ghari (University of Tehran, Insitute
More informationChapter 1 A New Parallel Algorithm for Computing the Singular Value Decomposition
Chapter 1 A New Parallel Algorithm for Computing the Singular Value Decomposition Nicholas J. Higham Pythagoras Papadimitriou Abstract A new method is described for computing the singular value decomposition
More informationFMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)
More informationCHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION
CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant
More informationNumerical Linear Algebra
Numerical Linear Algebra Probably the simplest kind of problem. Occurs in many contexts, often as part of larger problem. Symbolic manipulation packages can do linear algebra "analytically" (e.g. Mathematica,
More informationCompressed Sensing Algorithm for Real-Time Doppler Ultrasound Image Reconstruction
Mathematical Modelling and Applications 2017; 2(6): 75-80 http://www.sciencepublishinggroup.com/j/mma doi: 10.11648/j.mma.20170206.14 ISSN: 2575-1786 (Print); ISSN: 2575-1794 (Online) Compressed Sensing
More informationCS 664 Structure and Motion. Daniel Huttenlocher
CS 664 Structure and Motion Daniel Huttenlocher Determining 3D Structure Consider set of 3D points X j seen by set of cameras with projection matrices P i Given only image coordinates x ij of each point
More informationComputational Methods CMSC/AMSC/MAPL 460. Vectors, Matrices, Linear Systems, LU Decomposition, Ramani Duraiswami, Dept. of Computer Science
Computational Methods CMSC/AMSC/MAPL 460 Vectors, Matrices, Linear Systems, LU Decomposition, Ramani Duraiswami, Dept. of Computer Science Zero elements of first column below 1 st row multiplying 1 st
More informationMatrix Multiplication
Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2013 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2013 1 / 32 Outline 1 Matrix operations Importance Dense and sparse
More informationQR Decomposition on GPUs
QR Decomposition QR Algorithms Block Householder QR Andrew Kerr* 1 Dan Campbell 1 Mark Richards 2 1 Georgia Tech Research Institute 2 School of Electrical and Computer Engineering Georgia Institute of
More informationI How does the formulation (5) serve the purpose of the composite parameterization
Supplemental Material to Identifying Alzheimer s Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis I How does the formulation (5)
More informationFactorization with Missing and Noisy Data
Factorization with Missing and Noisy Data Carme Julià, Angel Sappa, Felipe Lumbreras, Joan Serrat, and Antonio López Computer Vision Center and Computer Science Department, Universitat Autònoma de Barcelona,
More informationChapter 1 A New Parallel Algorithm for Computing the Singular Value Decomposition
Chapter 1 A New Parallel Algorithm for Computing the Singular Value Decomposition Nicholas J. Higham Pythagoras Papadimitriou Abstract A new method is described for computing the singular value decomposition
More informationModel parametrization strategies for Newton-based acoustic full waveform
Model parametrization strategies for Newton-based acoustic full waveform inversion Amsalu Y. Anagaw, University of Alberta, Edmonton, Canada, aanagaw@ualberta.ca Summary This paper studies the effects
More informationRealization of Hardware Architectures for Householder Transformation based QR Decomposition using Xilinx System Generator Block Sets
IJSTE - International Journal of Science Technology & Engineering Volume 2 Issue 08 February 2016 ISSN (online): 2349-784X Realization of Hardware Architectures for Householder Transformation based QR
More informationMatrix Multiplication
Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2018 1 / 32 Outline 1 Matrix operations Importance Dense and sparse
More informationEE613 Machine Learning for Engineers LINEAR REGRESSION. Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov.
EE613 Machine Learning for Engineers LINEAR REGRESSION Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov. 9, 2017 1 Outline Multivariate ordinary least squares Matlab code:
More informationParallelization of the QR Decomposition with Column Pivoting Using Column Cyclic Distribution on Multicore and GPU Processors
Parallelization of the QR Decomposition with Column Pivoting Using Column Cyclic Distribution on Multicore and GPU Processors Andrés Tomás 1, Zhaojun Bai 1, and Vicente Hernández 2 1 Department of Computer
More informationPerformance of Multicore LUP Decomposition
Performance of Multicore LUP Decomposition Nathan Beckmann Silas Boyd-Wickizer May 3, 00 ABSTRACT This paper evaluates the performance of four parallel LUP decomposition implementations. The implementations
More information2.7 Numerical Linear Algebra Software
2.7 Numerical Linear Algebra Software In this section we will discuss three software packages for linear algebra operations: (i) (ii) (iii) Matlab, Basic Linear Algebra Subroutines (BLAS) and LAPACK. There
More informationSPARSE COMPONENT ANALYSIS FOR BLIND SOURCE SEPARATION WITH LESS SENSORS THAN SOURCES. Yuanqing Li, Andrzej Cichocki and Shun-ichi Amari
SPARSE COMPONENT ANALYSIS FOR BLIND SOURCE SEPARATION WITH LESS SENSORS THAN SOURCES Yuanqing Li, Andrzej Cichocki and Shun-ichi Amari Laboratory for Advanced Brain Signal Processing Laboratory for Mathematical
More informationBindel, Spring 2015 Numerical Analysis (CS 4220) Figure 1: A blurred mystery photo taken at the Ithaca SPCA. Proj 2: Where Are My Glasses?
Figure 1: A blurred mystery photo taken at the Ithaca SPCA. Proj 2: Where Are My Glasses? 1 Introduction The image in Figure 1 is a blurred version of a picture that I took at the local SPCA. Your mission
More informationNumerical considerations
Numerical considerations CHAPTER 6 CHAPTER OUTLINE 6.1 Floating-Point Data Representation...13 Normalized Representation of M...13 Excess Encoding of E...133 6. Representable Numbers...134 6.3 Special
More informationQ. Wang National Key Laboratory of Antenna and Microwave Technology Xidian University No. 2 South Taiba Road, Xi an, Shaanxi , P. R.
Progress In Electromagnetics Research Letters, Vol. 9, 29 38, 2009 AN IMPROVED ALGORITHM FOR MATRIX BANDWIDTH AND PROFILE REDUCTION IN FINITE ELEMENT ANALYSIS Q. Wang National Key Laboratory of Antenna
More informationEE613 Machine Learning for Engineers LINEAR REGRESSION. Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov.
EE613 Machine Learning for Engineers LINEAR REGRESSION Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov. 4, 2015 1 Outline Multivariate ordinary least squares Singular value
More informationRobust Face Recognition via Sparse Representation
Robust Face Recognition via Sparse Representation Panqu Wang Department of Electrical and Computer Engineering University of California, San Diego La Jolla, CA 92092 pawang@ucsd.edu Can Xu Department of
More informationContents. I Basics 1. Copyright by SIAM. Unauthorized reproduction of this article is prohibited.
page v Preface xiii I Basics 1 1 Optimization Models 3 1.1 Introduction... 3 1.2 Optimization: An Informal Introduction... 4 1.3 Linear Equations... 7 1.4 Linear Optimization... 10 Exercises... 12 1.5
More information1.2 Numerical Solutions of Flow Problems
1.2 Numerical Solutions of Flow Problems DIFFERENTIAL EQUATIONS OF MOTION FOR A SIMPLIFIED FLOW PROBLEM Continuity equation for incompressible flow: 0 Momentum (Navier-Stokes) equations for a Newtonian
More informationAnalytic Performance Models for Bounded Queueing Systems
Analytic Performance Models for Bounded Queueing Systems Praveen Krishnamurthy Roger D. Chamberlain Praveen Krishnamurthy and Roger D. Chamberlain, Analytic Performance Models for Bounded Queueing Systems,
More informationRobust Principal Component Analysis (RPCA)
Robust Principal Component Analysis (RPCA) & Matrix decomposition: into low-rank and sparse components Zhenfang Hu 2010.4.1 reference [1] Chandrasekharan, V., Sanghavi, S., Parillo, P., Wilsky, A.: Ranksparsity
More informationEuclidean Space. Definition 1 (Euclidean Space) A Euclidean space is a finite-dimensional vector space over the reals R, with an inner product,.
Definition 1 () A Euclidean space is a finite-dimensional vector space over the reals R, with an inner product,. 1 Inner Product Definition 2 (Inner Product) An inner product, on a real vector space X
More informationStereo Vision. MAN-522 Computer Vision
Stereo Vision MAN-522 Computer Vision What is the goal of stereo vision? The recovery of the 3D structure of a scene using two or more images of the 3D scene, each acquired from a different viewpoint in
More informationData Partitioning. Figure 1-31: Communication Topologies. Regular Partitions
Data In single-program multiple-data (SPMD) parallel programs, global data is partitioned, with a portion of the data assigned to each processing node. Issues relevant to choosing a partitioning strategy
More informationInterior Reconstruction Using the Truncated Hilbert Transform via Singular Value Decomposition
Interior Reconstruction Using the Truncated Hilbert Transform via Singular Value Decomposition Hengyong Yu 1, Yangbo Ye 2 and Ge Wang 1 1 CT Laboratory, Biomedical Imaging Division, VT-WFU School of Biomedical
More informationAn Improved Measurement Placement Algorithm for Network Observability
IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 16, NO. 4, NOVEMBER 2001 819 An Improved Measurement Placement Algorithm for Network Observability Bei Gou and Ali Abur, Senior Member, IEEE Abstract This paper
More informationLecture 8 Fitting and Matching
Lecture 8 Fitting and Matching Problem formulation Least square methods RANSAC Hough transforms Multi-model fitting Fitting helps matching! Reading: [HZ] Chapter: 4 Estimation 2D projective transformation
More informationClassroom Tips and Techniques: Least-Squares Fits. Robert J. Lopez Emeritus Professor of Mathematics and Maple Fellow Maplesoft
Introduction Classroom Tips and Techniques: Least-Squares Fits Robert J. Lopez Emeritus Professor of Mathematics and Maple Fellow Maplesoft The least-squares fitting of functions to data can be done in
More informationF02WUF NAG Fortran Library Routine Document
F02 Eigenvalues and Eigenvectors F02WUF NAG Fortran Library Routine Document Note. Before using this routine, please read the Users Note for your implementation to check the interpretation of bold italicised
More informationSparse Solutions to Linear Inverse Problems. Yuzhe Jin
Sparse Solutions to Linear Inverse Problems Yuzhe Jin Outline Intro/Background Two types of algorithms Forward Sequential Selection Methods Diversity Minimization Methods Experimental results Potential
More informationA Scalable Parallel LSQR Algorithm for Solving Large-Scale Linear System for Seismic Tomography
1 A Scalable Parallel LSQR Algorithm for Solving Large-Scale Linear System for Seismic Tomography He Huang, Liqiang Wang, Po Chen(University of Wyoming) John Dennis (NCAR) 2 LSQR in Seismic Tomography
More informationWeek 7 Picturing Network. Vahe and Bethany
Week 7 Picturing Network Vahe and Bethany Freeman (2005) - Graphic Techniques for Exploring Social Network Data The two main goals of analyzing social network data are identification of cohesive groups
More informationMulti-azimuth velocity estimation
Stanford Exploration Project, Report 84, May 9, 2001, pages 1 87 Multi-azimuth velocity estimation Robert G. Clapp and Biondo Biondi 1 ABSTRACT It is well known that the inverse problem of estimating interval
More informationParallel Implementations of Gaussian Elimination
s of Western Michigan University vasilije.perovic@wmich.edu January 27, 2012 CS 6260: in Parallel Linear systems of equations General form of a linear system of equations is given by a 11 x 1 + + a 1n
More informationAN ALGORITHM FOR BLIND RESTORATION OF BLURRED AND NOISY IMAGES
AN ALGORITHM FOR BLIND RESTORATION OF BLURRED AND NOISY IMAGES Nader Moayeri and Konstantinos Konstantinides Hewlett-Packard Laboratories 1501 Page Mill Road Palo Alto, CA 94304-1120 moayeri,konstant@hpl.hp.com
More informationHumanoid Robotics. Least Squares. Maren Bennewitz
Humanoid Robotics Least Squares Maren Bennewitz Goal of This Lecture Introduction into least squares Use it yourself for odometry calibration, later in the lecture: camera and whole-body self-calibration
More informationParallel Linear Algebra in Julia
Parallel Linear Algebra in Julia Britni Crocker and Donglai Wei 18.337 Parallel Computing 12.17.2012 1 Table of Contents 1. Abstract... 2 2. Introduction... 3 3. Julia Implementation...7 4. Performance...
More informationAdapting to Non-Stationarity in EEG using a Mixture of Multiple ICA Models
Adapting to Non-Stationarity in EEG using a Mixture of Multiple ICA Models Jason A. Palmer 1 Scott Makeig 1 Julie Onton 1 Zeynep Akalin-Acar 1 Ken Kreutz-Delgado 2 Bhaskar D. Rao 2 1 Swartz Center for
More informationEfficient Implementation of the K-SVD Algorithm using Batch Orthogonal Matching Pursuit
Efficient Implementation of the K-SVD Algorithm using Batch Orthogonal Matching Pursuit Ron Rubinstein, Michael Zibulevsky and Michael Elad Abstract The K-SVD algorithm is a highly effective method of
More informationDECOMPOSITION is one of the important subjects in
Proceedings of the Federated Conference on Computer Science and Information Systems pp. 561 565 ISBN 978-83-60810-51-4 Analysis and Comparison of QR Decomposition Algorithm in Some Types of Matrix A. S.
More information1 INTRODUCTION The LMS adaptive algorithm is the most popular algorithm for adaptive ltering because of its simplicity and robustness. However, its ma
MULTIPLE SUBSPACE ULV ALGORITHM AND LMS TRACKING S. HOSUR, A. H. TEWFIK, D. BOLEY University of Minnesota 200 Union St. S.E. Minneapolis, MN 55455 U.S.A fhosur@ee,tewk@ee,boley@csg.umn.edu ABSTRACT. The
More informationSTEPHEN WOLFRAM MATHEMATICADO. Fourth Edition WOLFRAM MEDIA CAMBRIDGE UNIVERSITY PRESS
STEPHEN WOLFRAM MATHEMATICADO OO Fourth Edition WOLFRAM MEDIA CAMBRIDGE UNIVERSITY PRESS Table of Contents XXI a section new for Version 3 a section new for Version 4 a section substantially modified for
More informationPredicting Web Service Levels During VM Live Migrations
Predicting Web Service Levels During VM Live Migrations 5th International DMTF Academic Alliance Workshop on Systems and Virtualization Management: Standards and the Cloud Helmut Hlavacs, Thomas Treutner
More informationThis leads to our algorithm which is outlined in Section III, along with a tabular summary of it's performance on several benchmarks. The last section
An Algorithm for Incremental Construction of Feedforward Networks of Threshold Units with Real Valued Inputs Dhananjay S. Phatak Electrical Engineering Department State University of New York, Binghamton,
More informationA hybrid GMRES and TV-norm based method for image restoration
A hybrid GMRES and TV-norm based method for image restoration D. Calvetti a, B. Lewis b and L. Reichel c a Department of Mathematics, Case Western Reserve University, Cleveland, OH 44106 b Rocketcalc,
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 20: Sparse Linear Systems; Direct Methods vs. Iterative Methods Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 26
More informationLecture 9 Fitting and Matching
Lecture 9 Fitting and Matching Problem formulation Least square methods RANSAC Hough transforms Multi- model fitting Fitting helps matching! Reading: [HZ] Chapter: 4 Estimation 2D projective transformation
More informationComputational Methods CMSC/AMSC/MAPL 460. Vectors, Matrices, Linear Systems, LU Decomposition, Ramani Duraiswami, Dept. of Computer Science
Computational Methods CMSC/AMSC/MAPL 460 Vectors, Matrices, Linear Systems, LU Decomposition, Ramani Duraiswami, Dept. of Computer Science Some special matrices Matlab code How many operations and memory
More informationIntel Math Kernel Library (Intel MKL) Sparse Solvers. Alexander Kalinkin Intel MKL developer, Victor Kostin Intel MKL Dense Solvers team manager
Intel Math Kernel Library (Intel MKL) Sparse Solvers Alexander Kalinkin Intel MKL developer, Victor Kostin Intel MKL Dense Solvers team manager Copyright 3, Intel Corporation. All rights reserved. Sparse
More informationWorkshop - Model Calibration and Uncertainty Analysis Using PEST
About PEST PEST (Parameter ESTimation) is a general-purpose, model-independent, parameter estimation and model predictive uncertainty analysis package developed by Dr. John Doherty. PEST is the most advanced
More information(Creating Arrays & Matrices) Applied Linear Algebra in Geoscience Using MATLAB
Applied Linear Algebra in Geoscience Using MATLAB (Creating Arrays & Matrices) Contents Getting Started Creating Arrays Mathematical Operations with Arrays Using Script Files and Managing Data Two-Dimensional
More informationParallelizing LU Factorization
Parallelizing LU Factorization Scott Ricketts December 3, 2006 Abstract Systems of linear equations can be represented by matrix equations of the form A x = b LU Factorization is a method for solving systems
More informationMemory Management. Reading: Silberschatz chapter 9 Reading: Stallings. chapter 7 EEL 358
Memory Management Reading: Silberschatz chapter 9 Reading: Stallings chapter 7 1 Outline Background Issues in Memory Management Logical Vs Physical address, MMU Dynamic Loading Memory Partitioning Placement
More informationReversible Wavelets for Embedded Image Compression. Sri Rama Prasanna Pavani Electrical and Computer Engineering, CU Boulder
Reversible Wavelets for Embedded Image Compression Sri Rama Prasanna Pavani Electrical and Computer Engineering, CU Boulder pavani@colorado.edu APPM 7400 - Wavelets and Imaging Prof. Gregory Beylkin -
More informationDense Matrix Algorithms
Dense Matrix Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text Introduction to Parallel Computing, Addison Wesley, 2003. Topic Overview Matrix-Vector Multiplication
More informationEastern Mediterranean University School of Computing and Technology CACHE MEMORY. Computer memory is organized into a hierarchy.
Eastern Mediterranean University School of Computing and Technology ITEC255 Computer Organization & Architecture CACHE MEMORY Introduction Computer memory is organized into a hierarchy. At the highest
More informationChapter 4. Matrix and Vector Operations
1 Scope of the Chapter Chapter 4 This chapter provides procedures for matrix and vector operations. This chapter (and Chapters 5 and 6) can handle general matrices, matrices with special structure and
More informationContent-based Dimensionality Reduction for Recommender Systems
Content-based Dimensionality Reduction for Recommender Systems Panagiotis Symeonidis Aristotle University, Department of Informatics, Thessaloniki 54124, Greece symeon@csd.auth.gr Abstract. Recommender
More informationLinear Methods for Regression and Shrinkage Methods
Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors
More informationDiffuse Optical Tomography, Inverse Problems, and Optimization. Mary Katherine Huffman. Undergraduate Research Fall 2011 Spring 2012
Diffuse Optical Tomography, Inverse Problems, and Optimization Mary Katherine Huffman Undergraduate Research Fall 11 Spring 12 1. Introduction. This paper discusses research conducted in order to investigate
More information3.1. Solution for white Gaussian noise
Low complexity M-hypotheses detection: M vectors case Mohammed Nae and Ahmed H. Tewk Dept. of Electrical Engineering University of Minnesota, Minneapolis, MN 55455 mnae,tewk@ece.umn.edu Abstract Low complexity
More informationEECS 442 Computer vision. Fitting methods
EECS 442 Computer vision Fitting methods - Problem formulation - Least square methods - RANSAC - Hough transforms - Multi-model fitting - Fitting helps matching! Reading: [HZ] Chapters: 4, 11 [FP] Chapters:
More informationLecture 27: Fast Laplacian Solvers
Lecture 27: Fast Laplacian Solvers Scribed by Eric Lee, Eston Schweickart, Chengrun Yang November 21, 2017 1 How Fast Laplacian Solvers Work We want to solve Lx = b with L being a Laplacian matrix. Recall
More informationComparing Hybrid CPU-GPU and Native GPU-only Acceleration for Linear Algebra. Mark Gates, Stan Tomov, Azzam Haidar SIAM LA Oct 29, 2015
Comparing Hybrid CPU-GPU and Native GPU-only Acceleration for Linear Algebra Mark Gates, Stan Tomov, Azzam Haidar SIAM LA Oct 29, 2015 Overview Dense linear algebra algorithms Hybrid CPU GPU implementation
More informationModern GPUs (Graphics Processing Units)
Modern GPUs (Graphics Processing Units) Powerful data parallel computation platform. High computation density, high memory bandwidth. Relatively low cost. NVIDIA GTX 580 512 cores 1.6 Tera FLOPs 1.5 GB
More informationCS 770G - Parallel Algorithms in Scientific Computing
CS 770G - Parallel lgorithms in Scientific Computing Dense Matrix Computation II: Solving inear Systems May 28, 2001 ecture 6 References Introduction to Parallel Computing Kumar, Grama, Gupta, Karypis,
More information(Sparse) Linear Solvers
(Sparse) Linear Solvers Ax = B Why? Many geometry processing applications boil down to: solve one or more linear systems Parameterization Editing Reconstruction Fairing Morphing 2 Don t you just invert
More informationRecognition, SVD, and PCA
Recognition, SVD, and PCA Recognition Suppose you want to find a face in an image One possibility: look for something that looks sort of like a face (oval, dark band near top, dark band near bottom) Another
More informationLeast-Squares Fitting of Data with B-Spline Curves
Least-Squares Fitting of Data with B-Spline Curves David Eberly, Geometric Tools, Redmond WA 98052 https://www.geometrictools.com/ This work is licensed under the Creative Commons Attribution 4.0 International
More informationAM205: lecture 2. 1 These have been shifted to MD 323 for the rest of the semester.
AM205: lecture 2 Luna and Gary will hold a Python tutorial on Wednesday in 60 Oxford Street, Room 330 Assignment 1 will be posted this week Chris will hold office hours on Thursday (1:30pm 3:30pm, Pierce
More informationRandom projection for non-gaussian mixture models
Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,
More informationRECONSTRUCTION AND ENHANCEMENT OF CURRENT DISTRIBUTION ON CURVED SURFACES FROM BIOMAGNETIC FIELDS USING POCS
DRAFT: October, 4: File: ramon-et-al pp.4 Page 4 Sheet of 8 CANADIAN APPLIED MATHEMATICS QUARTERLY Volume, Number, Summer RECONSTRUCTION AND ENHANCEMENT OF CURRENT DISTRIBUTION ON CURVED SURFACES FROM
More informationLecture 7: Most Common Edge Detectors
#1 Lecture 7: Most Common Edge Detectors Saad Bedros sbedros@umn.edu Edge Detection Goal: Identify sudden changes (discontinuities) in an image Intuitively, most semantic and shape information from the
More informationTechniques for Optimizing FEM/MoM Codes
Techniques for Optimizing FEM/MoM Codes Y. Ji, T. H. Hubing, and H. Wang Electromagnetic Compatibility Laboratory Department of Electrical & Computer Engineering University of Missouri-Rolla Rolla, MO
More informationNumerical Robustness. The implementation of adaptive filtering algorithms on a digital computer, which inevitably operates using finite word-lengths,
1. Introduction Adaptive filtering techniques are used in a wide range of applications, including echo cancellation, adaptive equalization, adaptive noise cancellation, and adaptive beamforming. These
More informationGTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS. Kyle Spagnoli. Research EM Photonics 3/20/2013
GTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS Kyle Spagnoli Research Engineer @ EM Photonics 3/20/2013 INTRODUCTION» Sparse systems» Iterative solvers» High level benchmarks»
More informationData Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Exploratory data analysis tasks Examine the data, in search of structures
More informationMatrix algorithms: fast, stable, communication-optimizing random?!
Matrix algorithms: fast, stable, communication-optimizing random?! Ioana Dumitriu Department of Mathematics University of Washington (Seattle) Joint work with Grey Ballard, James Demmel, Olga Holtz, Robert
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 1: Course Overview; Matrix Multiplication Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 21 Outline 1 Course
More informationScientific Computing. Some slides from James Lambers, Stanford
Scientific Computing Some slides from James Lambers, Stanford Dense Linear Algebra Scaling and sums Transpose Rank-one updates Rotations Matrix vector products Matrix Matrix products BLAS Designing Numerical
More informationLecture 11: Randomized Least-squares Approximation in Practice. 11 Randomized Least-squares Approximation in Practice
Stat60/CS94: Randomized Algorithms for Matrices and Data Lecture 11-10/09/013 Lecture 11: Randomized Least-squares Approximation in Practice Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these
More informationDiffusion Wavelets for Natural Image Analysis
Diffusion Wavelets for Natural Image Analysis Tyrus Berry December 16, 2011 Contents 1 Project Description 2 2 Introduction to Diffusion Wavelets 2 2.1 Diffusion Multiresolution............................
More information