Implementation of QR Up- and Downdating on a. Massively Parallel Computer. Hans Bruun Nielsen z Mustafa Pnar z. July 8, 1996.
|
|
- Sherilyn Jones
- 5 years ago
- Views:
Transcription
1 Implementation of QR Up- and Downdating on a Massively Parallel Computer Claus Btsen y Per Christian Hansen y Kaj Madsen z Hans Bruun Nielsen z Mustafa Pnar z July 8, 1996 Abstract We describe an implementation of QR up- and downdating on a massively parallel computer (the Connection Machine CM{200) and show that the algorithm maps well onto the computer. In particular, we show how the use of corrected semi-normal equations for downdating can be eciently implemented. We also illustrate the use of our algorithms in a new LP algorithm. Key words. up- and downdating of QR factorization, corrected seminormal equations, CM{ Introduction In this paper we describe an ecient implementation of updating and downdating of a QR factorization on the Connection Machine CM{200, which is a massively parallel SIMD computer [11]. Many of our considerations are general for massively parallel computers. This project was sponsored by the Danish Center for Parallel Computer Research. M. Pnar was also sponsored by the Danish Natural Science Research Council, Grant No y UNIC (Danish Computing Centre for Research and Education), Building 305, Technical University of Denmark, DK-2800 Lyngby, Denmark (Claus.Btsen@uni-c.dk, Per.Christian.Hansen@uni-c.dk). z Institute for Numerical Analysis, Building 305, Technical University of Denmark, DK Lyngby, Denmark (numikm@vm.uni-c.dk, numimpi@uts.uni-c.dk). 1
2 Many linear algebra routines can be implemented very eciently on massively parallel computers [2]. However, it is not immediately clear whether updating and downdating of a QR factorization in particular in the case where only the triangular matrix R is stored provides enough parallelism for an ecient SIMD implementation. The main goal of this paper is to show that this is indeed the case. Our work was motivated by the use of QR up- and downdating in a new algorithm for linear programming described in [7]. The algorithm was implemented on an 8K CM{200 located at UNIC [8], and since there are no routines for QR up- and downdating in the CMSSL scientic subroutine library for the Connection Machine, it was necessary to implement such routines. Throughout the paper, we are concerned with QR factorizations of the form A = QR (1) with A 2 R mn ; Q 2 R mm ; R 2 R mn ; m n: We assume that the matrix Q is not stored, and we want to recompute the triangular factor R eciently when a row is either apped to A (updating) or removed from A (downdating). The algorithm for updating is a classical one (see, e.g., [5, x12.6]) and the downdating algorithm is a new \hybrid" algorithm using corrected semi-normal equations from [1]. Both algorithms are numerically stable. Our paper is organized as follows. In x2 we summarize the up- and downdating algorithms and we investigate numerically the accuracy of the downdating algorithm. Implementation details are given in x3. In x4 we illustrate the use of our implementation in connection with the above-mentioned LP algorithm. 2 Up- and Downdating Algorithms In this section we briey summarize the algorithms for up- and downdating of a QR factorization. 2.1 Updating If we wish to update the matrix A by an arbitrary new row u T, then this row can always be permuted to the top. Hence, there is no loss of generality 2
3 in assuming an updating of the form u T ~A = A where A ~ is the updated matrix. If we rewrite this equation using (1), then we obtain 1 0 T ~A u T = H; 0 Q R ; where the matrix H has upper Hessenberg form. Now, if ~A = ~ Q ~ R is the QR factorization of ~ A, then it follows that we can obtain ~ R by reducing H to upper triangular form ~ R by means of orthogonal transformations. Two types of orthogonal transformations are relevant here: Givens rotations and fast Givens rotations [5, x5.1]. In connection with the QR updating problem, the fast Givens method requires O(2:5n 2 ) ops 1 while the classical Givens method requires O(3n 2 ) ops. The fast Givens rotations have a reputation for being impractical because of the potential danger for over- or underow; see [5, p. 209]. However, in this particular application this is no problem since each row is only involved on two rotations, so the maximum growth in the elements is limited by a factor 4. We have therefore decided to use the fast Givens rotations. The detailed algorithm for the updating algorithm is given in the Appix. We mention in passing that the QR factorization provided in the CMSSL library uses a block-cyclic data layout [6]. We can easily make our updating algorithm conform with this layout by performing the fast Givens rotations in the same order as this layout (the details are straightforward and are omitted here). In this way, our routines are compatible with the CMSSL routines. 2.2 Classical (LINPACK) Downdating If we want to remove an arbitrary row u T from the matrix A, then again without loss of generality we can assume that A has the form u T A = ; ~A 1 Here, one op is either an addition or a multiplication. 3
4 where ~ A is equal to A with the rst row ut deleted. Now let q T denote the rst row of the matrix Q in the QR factorization of A. Then there exists an orthogonal matrix G such that G T q = (1; 0; : : :; 0) T ; with = 1: In particular, if G is constructed as a sequence of Givens rotations, G = G m?1 G 1 ; where each rotation G T i involves elements i and i + 1 of q, then it follows that G T R has upper Hessenberg form, i.e., G T R = G T 1 G T R = v T m?1 : (2) ~R Moreover, we have that and therefore u T ~A 0 Q G = 0 Q ~ 0 v = A = QR = Q G G T T v T R = 0 Q ~ = ~R ~Q R ~ ; in which we identify A ~ = Q ~ R ~ as the desired QR factorization of A. ~ This algorithm is mixed stable [9]. Often Q is not available because of storage considerations. Hence, the algorithm above must be modied to take this situation into account. Notice that only the n Givens transformations G 1 ; : : :; G n alter R; we need not determine G n+1 ; : : :; G m?1. Now, if q 1:n denotes the rst n components of q, then we have 0 1 G T n+1 GT q q 1:n m?1 A n 1 (3) 0 m? n? 1 where =? 1? kq 1:n k : (4) Since Q is not available, we rst have to compute the necessary quantities in (3). The vector q 1:n can be computed from the system R T q 1:n = u; (5) 4
5 and then is computed from (4). From the vector in (3) we can then construct G n ; : : :; G 1 and apply these rotations to R as in (2); in this way we produce R. ~ This is the well-know LINPACK algorithm [4, x10]. Again, we can use fast Givens transformations in our implementation without danger for over- or underow, and we take into account the block cyclic layout of R conforming with the CMSSL library. 2.3 CSNE Downdating In [1] it is shown that the LINPACK downdating algorithm can be arbitrarily inaccurate, because the sole use of R to form that downdating transformations may lead to a much more ill-conditioned problem than using both Q and R. It is therefore proposed in [1] to use corrected semi-normal equations to improve the accuracy of q 1:n, computed by (5), before it is used to construct the Givens transformations in (2). We summarize the algorithm here and refer to [1] for more details: CSNE Downdating of R 1. Solve R T q 1:n = u T for q 1:n 2. Solve R v = q 1:n for v 3. Let t e 1? Av where e 1 = (1; 0; : : :; 0) T 4. Solve R T q 1:n = A T t for q 1:n 5. Let q 1:n q 1:n + q 1:n 6. Solve R v = q 1:n for v 7. Let t t? Av 8. Let ktk 2 9. Continue using the LINPACK algorithm. The key to the improved stability is the renement of q 1:n in steps 2-5, combined with a more accurate computation of in step 8 via v and t (instead of Eq. (4)). Applying the CSNE algorithm is obviously more expensive than applying the LINPACK algorithm. It is therefore recommed to use a hybrid algorithm where the CSNE algorithm is only used if the system is ill-conditioned and the LINPACK algorithm is used otherwise [1]. As a measure of the conditioning of the system we use 2 = 1? jjq 1:n jj 2 2 = ktk 2 2 and apply the CSNE algorithm if 2 is less than a user-specied tolerance. We used a tolerance equal to 1=4 as recommed in [1]. 5
6 2.4 Accuracy of CSNE Downdating The updating algorithm is known to be numerically stable and to yield good accuracy, so we concentrate on studying the accuracy of the hybrid CSNE downdating algorithm. In order to test the accuracy of this algorithm and, in particular, its sensitivity to an ill-conditioned matrix A, we generate test matrices by the following strategy: 1. Generate a random (n + 1) n matrix A with n even. 2. Modify column n=2 of A as follows: A :;n=2 where 0 1. A :;n=2 + 1? 2? A:;n=2?1 + A :;n=2+1 ; In this way we can use to control the condition of A, since (for small ) column n=2 is almost a linear combination of the columns n=2? 1 and n= We also study the eect of doing more than one iteration in the above algorithm, in the sense that we repeate steps 4{7 together with the additional step v v + v a number of times. The results are shown in Fig. 1 for a series of random matrices. For each value of we generate 10 random matrices according to the above scheme, and for each matrix we compute R ~ by means of the LINPACK algorithm, the CSNE algorithm and the CSNE algorithm with 1 additional iteration. We compare R ~ with the matrix Rdirect ~ from a QR factorization of A ~ (consisting of the bottom 128 rows of A). Figure 1 shows the relative error k R ~? Rdirect ~ k 2 =k Rdirect ~ k 2 as a function of the parameter. As expected the CSNE algorithm is superior to the LINPACK algorithm with respect to sensitivity to an ill-conditioned problem. Furthermore we see that extra iterations in the CSNE algorithm never improve the accuracy of R. ~ 3 SIMD Implementation of Up- and Downdating 3.1 General Considerations When implementing algorithms on a parallel computer it is essential to choose an appropriate data layout in order to be able to operate simultaneously on as many processing elements as possible. On massively parallel 6
7 ~ ~ ~ R-R direct 2 / R direct LINPACK CSNE 10-6 CSNE + 1 extra it Figure 1: Accuracy of the hybrid CSNE algorithm and the LINPACK algorithm when downdating the test matrices described above. 7
8 computers we generally have three fundamentally dierent parallel layouts to choose from. Row-oriented layout: Each row is assigned to parallel processors and all the rows are stacked serially. Column-oriented layout: Each column is assigned to parallel processors and all the columns are stacked serially. Matrix-oriented layout: Rows as well as columns are assigned to parallel processors. Dierent data distributions are possible in this approach. Generally some sort of matrix-oriented layout is used in conjunction with matrix operations since, even for matrices of small dimensions, this ensures that elements are allocated to all processors. The dierent approaches will now be addressed with respect to QR up- and downdating Row-Oriented Layout Using this approach one can perform the fast Givens rotations in the upand downdating in parallel without introducing communication other than broadcasting the constants and which dene the Givens rotations. When a rotation is performed between two rows, the processors have to be activated a number of times. The resulting eciency can be approximated as follows. Let n be the number of columns and p the number of processors, and write n = bn=pcp + q where q is the remainder. Then 1 2 n2 p b n p c(b n p c + 1) + p(b n p c + 1)q The drawback of this layout is that a large problem size relative to the number of processors is needed in order to obtain decent results. E.g., if n = p=2 only 25% of the processor performance is utilized as opposed to 50% if n = p or approximately 67% if n = 2p. The solution of the linear systems related to the downdating can be eciently implemented using the row version of back substitution [5, p. 88] whereas transposed systems can be eciently solved using the column version of forward substitution [5, p. 89]. 8
9 3.1.2 Column-Oriented Layout This layout cannot be eciently used since the Givens rotations would have to be performed on one processor at a time Matrix-Oriented Layout If the processors are considered as congured in a grid, then a row in the matrix will live on a row of processors. Each row as well as each column may be wrapped on the processor grid. One possible matrix-oriented layout is the block cyclic layout often used for gaining good load balance in linear algebra applications [6]. If we use a block cyclic layout then the Givens rotations can be performed in parallel without introducing communication other than broadcasting the constants and when the rows involved in the rotation are part of the same block, and by introducing nearest neighbor communication equal to one shift operation if they are not part of the same block. Only one shift operation is needed to ensure that all the forward/backward references live on the same processor as the active row. It can be shown that the CSNE downdating can be performed in the block cyclic domain without introducing any overhead. The eciency of this layout is by nature much lower than for the roworiented layout since only one or two rows of processors will be active simultaneously. This layout should not be discarded, though, since overall considerations of eciency might imply using a matrix-oriented layout. In any case this layout is attractive if the alternative is refactorization. 3.2 Implementation on the Connection Machine CM-200 The updating and hybrid downdating algorithms have been implemented in CM-FORTRAN on a 8K CM-200, but it also applies to the slower CM- 2. We have implemented the row-oriented as well as the matrix-oriented layout. It has not been possible to activate only the processors on and to the right of the diagonal of the matrix, which means that we do operations on zeros. Furthermore, in the matrix oriented implementation it has not been possible to detect whether two rows live on the same processors resulting in unnecessary activation of the communication primitives. The implementation using a matrix-oriented layout is based on a (:news, :news) layout, has storage requirements of roughly three times the size of 9
10 10 2 Time (sec.) CMSSL QR Fact. CSNE Downdate LINPACK Downdate Update n Figure 2: Performance of up- and downdating compared to the CMSSL QR factorization. the factorized matrix, and uses n? 1 shift operations. Since no shift operations are available for the block cyclic layout this implementation assumes normal ordering in order to avoid one s-operation per loop iteration. Both up- and downdating perform very poorly; less than 5 Mop/s is obtained. The bottleneck can be identied from the assembler code and is the creation of the masks related to performing the Givens rotations. The implementation which is based on a row-oriented layout uses a (:serial, :news) layout and has storage requirements of roughly 2n 2. Here the block cyclic ordering is eciently adaptable. The performance of these routines is shown in Fig. 2 together with results from refactorization using the CMSSL library 2. It is seen that the CSNE downdating is an 2 The timings were produced on the 8K CM-200 present at UNIC with CMF compiler 10
11 order of magnitude slower than the updating and LINPACK downdating, but it is still an order of magnitude faster than the refactorization for large n. Since it has not been possible to avoid the excessive engagement of virtual processors, it should fairly easily be possible to halve the computational work using a lower level language than CM-FORTRAN. 4 An Application in Linear Programming In this section we illustrate the use of the QR up- and downdating routines in an implementation of a new algorithm for linear programming based on a continuation algorithm. Today, such algorithms are realistic alternatives to the classical simplex method. Consider the normalized linear programming problem min x c T x subject to Gx = b;?e x e; (6) where G is m 1 n, c is an n-vector, and e = (1; : : :; 1) T. The continuation algorithm used here [7] solves this problem by rst solving its dual problem, min y kg T y + ck 1 + b T y; (7) and then detecting x from the residual vector G T y + c. The dual problem (7) is solved by an algorithm that essentially substitutes the non-smooth 1- norm with a smooth \Huber" norm with threshold, where the components of the residual vector G T y + c are treated dierently deping on whether they are greater than or smaller than. The key idea is to start with a large and then reduce it until the solution (7) can be identied from the solution to the \Huber" problem. This happens for a positive value of. The main computational problem during the algorithm outlined above is to solve a series of linear systems of equations, where the coecient matrix in each iteration step is modied by a few rank-one up- and downdates. Instead of computing a refactorization of the matrix each time, we use the up- and downdating routines described above and fall back on refactorization only in the few cases when it is necessary due to rank deciency. See [8] for more details about the implementation of the complete algorithm. The continuation algorithm was compared with an implementation of the simplex algorithm GEN SIMPLEX provided in the CMSSL library [10]. V. 1.2 and CMSSL library V. 3.1 Beta 2. The QR factorization timings were produced on a (:news,:news) layout since this seems to yield the highest performance. 11
12 1.0* * *10 3 Time (sec.) 1.0* * *10 2 Simplex LP 6.0* * * * *10 7 Problem size (m 1 +n)n Figure 3: Execution times for the CMSSL simplex algorithm and the continuation method, versus problem size. 12
13 The two algorithms were tested with a number of randomly generated dense matrices G of varying size. The execution times are are shown in Fig. 3 as a function of the quantity (m 1 n)n which is a measure of the \size" of the LP problem. Each point corresponds to the average of ve measured timings. We see that the two algorithms require approximately the same execution time, and that the continuation algorithm is asymptotically faster than the simplex algorithm for these dense test problems. Without the up- and downdating routines, the continuation algorithm would be up to 10 times slower for large n. Moreover, we remark that the continuation algorithm typically produces more accurate results than the simplex algorithm. See [8] for details. 5 Conclusion We have shown that stable up- and downdating of a QR factorization can be implemented eciently on a massively parallel computer, even without the use of low-level routines. We have also shown that a new LP algorithm, based on our routines, is competitive with a simplex routine from the CMSSL library for the Connection Machines. References [1] A. Bjorck, H. Park & L. Elden, Accurate downdating of least squares solutions, SIAM J. Matrix Anal. Appl. 15 (1994), to appear. [2] J. Demmel, M. Heath & H. A. van der Vorst, Parallel numerical linear algebra, Acta Numerica (1993), 111{197. [3] J. J. Dongarra, I. S. Du, D. C. Sorensen & H. A. van der Vorst, Solving Linear Systems on Vector and Shared Memory Computers, SIAM, Philadelphia, [4] J. J. Dongarra, C. B. Moler, J. R. Bunch & G. W. Stewart, LINPACK User's Guide, SIAM, [5] G. H. Golub & C. F. van Loan, Matrix Computations, Second Edition, The Johns Hopkins University Press, Baltimore,
14 [6] W. Lichtenstein & S. L. Johnsson, Block cyclic dense linear algebra, Report TMC 215, Thinking Machines Corporation, 1991; to appear in SIAM J. Sci. Comp. [7] K. Madsen, H. B. Nielsen, & M. C. Pinar, A New Finite Continuation Algorithm for Linear Programming, Report NI-93-07, Institute for Numerical Analysis, Technical University of Denmark; submitted to SIAM J. Optim. [8] K. Madsen, H. B. Nielsen, M. C. Pinar, C. Btsen & P. C. Hansen, Solving Bounded Variable Linear Problems on the Connection Machine CM-200, Report NI-93-08, Institute for Numerical Analysis, Techinical University of Denmark. [9] C. C. Paige, Error analysis of some techniques for updating orthogonal decompositions, Math. Comp. 34 (1980), 465{471. [10] Thinking Machines Corporation, CMSSL for CM Fortran, Version 3.0, CM{200 Edition, [11] Thinking Machines Corporation, CM{200 Technical Summary. Appix: Algorithms In this appix we give the detailed up- and downdating algorithms. The implementations are based on [5, x5]. Updating of R function R ~ = qrud(r,u) [m; n] size(r) [; ; type; ; ] fastgivens(u 1 ; R 1;1 ; 1; 1); = 1= p if type = 1 ~R 1;1:n = (u 1:n + R 1;1:n ); R 1;1:n = u 1:n + R 1;1:n else ~R1;1:n = (u 1:n + R 1;1:n); R 1;1:n = u 1:n + R 1;1:n for i = 2 : n? 1 [; ; type; ; ] fastgivens(r i?1;i; R i;i ; ; 1); = 1= p if type = 1 ~R i;i:n = (R i?1;i:n + R i;i:n ); R i;i:n = R i?1;i:n + R i;i:n 14
15 else ~R i;i:n = (R i?1;i:n + R i;i:n ); R i;i:n = R i?1;i:n + R i;i:n [; ; type; ; ] fastgivens(r n?1;n; R n;n ; ; 1); = 1= p if type = 1 ~R n;n = (R n?1;n + R n;n ) else ~R n;n = (R n?1;n + R n;n ) The op count for this algorithm is approximately 5n 2 =2. Here, the function \fastgivens" is implemented as folows: function [; ; type; 1 ; 2 ] = fastgivens( 1 ; 2 ; 1 ; 2 ) if 2 6= 0 =? 1 = 2 ; =? 2 = 1 ; =? if 1 type = 1; = 1 ; 1 = (1 + ) 2 ; 2 = (1 + ) else type = 2; = 1=; = 1=; = 1= 1 = (1 + ) 1 ; 2 = (1 + ) 2 else type = 2; = 0; = 0 LINPACK Downdating of R function R ~ = qrdd(r; u) [m; n] size(r) p q R?T 1:n;1:nu; = 1? kqk 2 2 [; ; type; ; ] fastgivens(q n ; ; 1; 1); = 1= p if type = 1 ~R n;n = R n;n ; R n;n = R n;n ; q n = q n + else ~R n;n = R n;n ; q n = q n + 15
16 for i = n? 1 :?1 : 2 [; ; type; ; ] fastgivens(q i ; q i+1 ; 1; ); = 1= p if type = 1 ~R i;i:n = (R i;i:n + R i+1;i:n ) else R i;i:n = R i;i:n + R i+1;i:n ; q n = q i + q i+1 ~R i;i:n = (R i;i:n + R i+1;i:n ) R i;i:n = R i;i:n + R i+1;i:n ; q n = q i + q i+1 [; ; type; ; ] fastgivens(q 1 ; q 2 ; 1; ); = 1= p if type = 1 ~R 1;1:n = (R 1;1:n + R 2;1:n ) else ~R 1;1:n = (R 1;1:n + R 2;1:n ) This algorithm also uses \fastgivens", and it requires approximately 5n 2 =2 ops. 16
on the CM-200. Claus Bendtsen Abstract to implement LAPACK-style routines already developed for other architectures,
\Quick" Implementation of Block LU Algorithms on the CM-200. Claus Bendtsen Abstract The CMSSL library only includes a limited amount of mathematical algorithms. Hence, when writing code for the Connection
More informationPARALLEL COMPUTATION OF THE SINGULAR VALUE DECOMPOSITION ON TREE ARCHITECTURES
PARALLEL COMPUTATION OF THE SINGULAR VALUE DECOMPOSITION ON TREE ARCHITECTURES Zhou B. B. and Brent R. P. Computer Sciences Laboratory Australian National University Canberra, ACT 000 Abstract We describe
More information1 INTRODUCTION The LMS adaptive algorithm is the most popular algorithm for adaptive ltering because of its simplicity and robustness. However, its ma
MULTIPLE SUBSPACE ULV ALGORITHM AND LMS TRACKING S. HOSUR, A. H. TEWFIK, D. BOLEY University of Minnesota 200 Union St. S.E. Minneapolis, MN 55455 U.S.A fhosur@ee,tewk@ee,boley@csg.umn.edu ABSTRACT. The
More informationParallel Computation of the Singular Value Decomposition on Tree Architectures
Parallel Computation of the Singular Value Decomposition on Tree Architectures Zhou B. B. and Brent R. P. y Computer Sciences Laboratory The Australian National University Canberra, ACT 000, Australia
More informationOn Parallel Implementation of the One-sided Jacobi Algorithm for Singular Value Decompositions
On Parallel Implementation of the One-sided Jacobi Algorithm for Singular Value Decompositions B. B. Zhou and R. P. Brent Computer Sciences Laboratory The Australian National University Canberra, ACT 000,
More informationComments on the randomized Kaczmarz method
Comments on the randomized Kaczmarz method Thomas Strohmer and Roman Vershynin Department of Mathematics, University of California Davis, CA 95616-8633, USA. strohmer@math.ucdavis.edu, vershynin@math.ucdavis.edu
More informationChapter 1 A New Parallel Algorithm for Computing the Singular Value Decomposition
Chapter 1 A New Parallel Algorithm for Computing the Singular Value Decomposition Nicholas J. Higham Pythagoras Papadimitriou Abstract A new method is described for computing the singular value decomposition
More informationOutline. Parallel Algorithms for Linear Algebra. Number of Processors and Problem Size. Speedup and Efficiency
1 2 Parallel Algorithms for Linear Algebra Richard P. Brent Computer Sciences Laboratory Australian National University Outline Basic concepts Parallel architectures Practical design issues Programming
More informationF02WUF NAG Fortran Library Routine Document
F02 Eigenvalues and Eigenvectors F02WUF NAG Fortran Library Routine Document Note. Before using this routine, please read the Users Note for your implementation to check the interpretation of bold italicised
More informationUMIACS-TR March Direction-of-Arrival Estimation Using the. G. Adams. M. F. Griffin. G. W. Stewart y. abstract
UMIACS-TR 91-46 March 1991 CS-TR-2640 Direction-of-Arrival Estimation Using the Rank-Revealing URV Decomposition G. Adams M. F. Griffin G. W. Stewart y abstract An algorithm for updating the null space
More informationHow to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić
How to perform HPL on CPU&GPU clusters Dr.sc. Draško Tomić email: drasko.tomic@hp.com Forecasting is not so easy, HPL benchmarking could be even more difficult Agenda TOP500 GPU trends Some basics about
More informationChapter 1 A New Parallel Algorithm for Computing the Singular Value Decomposition
Chapter 1 A New Parallel Algorithm for Computing the Singular Value Decomposition Nicholas J. Higham Pythagoras Papadimitriou Abstract A new method is described for computing the singular value decomposition
More informationMesh Decomposition and Communication Procedures for Finite Element Applications on the Connection Machine CM-5 System
Mesh Decomposition and Communication Procedures for Finite Element Applications on the Connection Machine CM-5 System The Harvard community has made this article openly available. Please share how this
More informationParallel Implementation of QRD Algorithms on the Fujitsu AP1000
Parallel Implementation of QRD Algorithms on the Fujitsu AP1000 Zhou, B. B. and Brent, R. P. Computer Sciences Laboratory Australian National University Canberra, ACT 0200 Abstract This paper addresses
More informationSolving Sparse Linear Systems. Forward and backward substitution for solving lower or upper triangular systems
AMSC 6 /CMSC 76 Advanced Linear Numerical Analysis Fall 7 Direct Solution of Sparse Linear Systems and Eigenproblems Dianne P. O Leary c 7 Solving Sparse Linear Systems Assumed background: Gauss elimination
More informationFast Algorithms for Regularized Minimum Norm Solutions to Inverse Problems
Fast Algorithms for Regularized Minimum Norm Solutions to Inverse Problems Irina F. Gorodnitsky Cognitive Sciences Dept. University of California, San Diego La Jolla, CA 9293-55 igorodni@ece.ucsd.edu Dmitry
More informationNAG Fortran Library Routine Document F08KAF (DGELSS).1
NAG Fortran Library Routine Document Note: before using this routine, please read the Users Note for your implementation to check the interpretation of bold italicised terms and other implementation-dependent
More informationParallel Implementations of Gaussian Elimination
s of Western Michigan University vasilije.perovic@wmich.edu January 27, 2012 CS 6260: in Parallel Linear systems of equations General form of a linear system of equations is given by a 11 x 1 + + a 1n
More informationDense Matrix Algorithms
Dense Matrix Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text Introduction to Parallel Computing, Addison Wesley, 2003. Topic Overview Matrix-Vector Multiplication
More informationIterative Solver Benchmark Jack Dongarra, Victor Eijkhout, Henk van der Vorst 2001/01/14 1 Introduction The traditional performance measurement for co
Iterative Solver Benchmark Jack Dongarra, Victor Eijkhout, Henk van der Vorst 2001/01/14 1 Introduction The traditional performance measurement for computers on scientic application has been the Linpack
More informationA Parallel Ring Ordering Algorithm for Ecient One-sided Jacobi SVD Computations
A Parallel Ring Ordering Algorithm for Ecient One-sided Jacobi SVD Computations B. B. Zhou and Richard P. Brent Computer Sciences Laboratory The Australian National University Canberra, ACT 000, Australia
More informationOrthogonal Matching Pursuit: Recursive Function Approximation with Applications to Wavelet. Y. C. Pati R. Rezaiifar and P. S.
/ To appear in Proc. of the 27 th Annual Asilomar Conference on Signals Systems and Computers, Nov. {3, 993 / Orthogonal Matching Pursuit: Recursive Function Approximation with Applications to Wavelet
More informationON DATA LAYOUT IN THE PARALLEL BLOCK-JACOBI SVD ALGORITHM WITH PRE PROCESSING
Proceedings of ALGORITMY 2009 pp. 449 458 ON DATA LAYOUT IN THE PARALLEL BLOCK-JACOBI SVD ALGORITHM WITH PRE PROCESSING MARTIN BEČKA, GABRIEL OKŠA, MARIÁN VAJTERŠIC, AND LAURA GRIGORI Abstract. An efficient
More informationTENTH WORLD CONGRESS ON THE THEORY OF MACHINES AND MECHANISMS Oulu, Finland, June 20{24, 1999 THE EFFECT OF DATA-SET CARDINALITY ON THE DESIGN AND STR
TENTH WORLD CONGRESS ON THE THEORY OF MACHINES AND MECHANISMS Oulu, Finland, June 20{24, 1999 THE EFFECT OF DATA-SET CARDINALITY ON THE DESIGN AND STRUCTURAL ERRORS OF FOUR-BAR FUNCTION-GENERATORS M.J.D.
More informationNonsymmetric Problems. Abstract. The eect of a threshold variant TPABLO of the permutation
Threshold Ordering for Preconditioning Nonsymmetric Problems Michele Benzi 1, Hwajeong Choi 2, Daniel B. Szyld 2? 1 CERFACS, 42 Ave. G. Coriolis, 31057 Toulouse Cedex, France (benzi@cerfacs.fr) 2 Department
More informationNAG Library Chapter Introduction. F16 Further Linear Algebra Support Routines
NAG Library Chapter Introduction Contents 1 Scope of the Chapter.... 2 2 Background to the Problems... 2 3 Recommendations on Choice and Use of Available Routines... 2 3.1 Naming Scheme... 2 3.1.1 NAGnames...
More informationFrequency Scaling and Energy Efficiency regarding the Gauss-Jordan Elimination Scheme on OpenPower 8
Frequency Scaling and Energy Efficiency regarding the Gauss-Jordan Elimination Scheme on OpenPower 8 Martin Köhler Jens Saak 2 The Gauss-Jordan Elimination scheme is an alternative to the LU decomposition
More informationTechniques for Optimizing FEM/MoM Codes
Techniques for Optimizing FEM/MoM Codes Y. Ji, T. H. Hubing, and H. Wang Electromagnetic Compatibility Laboratory Department of Electrical & Computer Engineering University of Missouri-Rolla Rolla, MO
More informationNAG Fortran Library Routine Document F07AAF (DGESV).1
NAG Fortran Library Routine Document Note: before using this routine, please read the Users Note for your implementation to check the interpretation of bold italicised terms and other implementation-dependent
More informationComparing Hybrid CPU-GPU and Native GPU-only Acceleration for Linear Algebra. Mark Gates, Stan Tomov, Azzam Haidar SIAM LA Oct 29, 2015
Comparing Hybrid CPU-GPU and Native GPU-only Acceleration for Linear Algebra Mark Gates, Stan Tomov, Azzam Haidar SIAM LA Oct 29, 2015 Overview Dense linear algebra algorithms Hybrid CPU GPU implementation
More informationF04EBFP.1. NAG Parallel Library Routine Document
F04 Simultaneous Linear Equations F04EBFP NAG Parallel Library Routine Document Note: before using this routine, please read the Users Note for your implementation to check for implementation-dependent
More informationBias-Variance Tradeos Analysis Using Uniform CR Bound. Mohammad Usman, Alfred O. Hero, Jerey A. Fessler and W. L. Rogers. University of Michigan
Bias-Variance Tradeos Analysis Using Uniform CR Bound Mohammad Usman, Alfred O. Hero, Jerey A. Fessler and W. L. Rogers University of Michigan ABSTRACT We quantify fundamental bias-variance tradeos for
More informationNumerical Linear Algebra
Numerical Linear Algebra Probably the simplest kind of problem. Occurs in many contexts, often as part of larger problem. Symbolic manipulation packages can do linear algebra "analytically" (e.g. Mathematica,
More informationPipeline Givens sequences for computing the QR decomposition on a EREW PRAM q
Parallel Computing 32 (2006) 222 230 www.elsevier.com/locate/parco Pipeline Givens sequences for computing the QR decomposition on a EREW PRAM q Marc Hofmann a, *, Erricos John Kontoghiorghes b,c a Institut
More informationSDLS: a Matlab package for solving conic least-squares problems
SDLS: a Matlab package for solving conic least-squares problems Didier Henrion 1,2 Jérôme Malick 3 June 28, 2007 Abstract This document is an introduction to the Matlab package SDLS (Semi-Definite Least-Squares)
More informationLecture 15: More Iterative Ideas
Lecture 15: More Iterative Ideas David Bindel 15 Mar 2010 Logistics HW 2 due! Some notes on HW 2. Where we are / where we re going More iterative ideas. Intro to HW 3. More HW 2 notes See solution code!
More informationComputational Methods in Statistics with Applications A Numerical Point of View. Large Data Sets. L. Eldén. March 2016
Computational Methods in Statistics with Applications A Numerical Point of View L. Eldén SeSe March 2016 Large Data Sets IDA Machine Learning Seminars, September 17, 2014. Sequential Decision Making: Experiment
More informationNeuro-Remodeling via Backpropagation of Utility. ABSTRACT Backpropagation of utility is one of the many methods for neuro-control.
Neuro-Remodeling via Backpropagation of Utility K. Wendy Tang and Girish Pingle 1 Department of Electrical Engineering SUNY at Stony Brook, Stony Brook, NY 11794-2350. ABSTRACT Backpropagation of utility
More informationData parallel algorithms 1
Data parallel algorithms (Guy Steele): The data-parallel programming style is an approach to organizing programs suitable for execution on massively parallel computers. In this lecture, we will characterize
More informationBlocked Schur Algorithms for Computing the Matrix Square Root. Deadman, Edvin and Higham, Nicholas J. and Ralha, Rui. MIMS EPrint: 2012.
Blocked Schur Algorithms for Computing the Matrix Square Root Deadman, Edvin and Higham, Nicholas J. and Ralha, Rui 2013 MIMS EPrint: 2012.26 Manchester Institute for Mathematical Sciences School of Mathematics
More informationParallel Numerical Algorithms
Parallel Numerical Algorithms Chapter 5 Vector and Matrix Products Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Michael T. Heath Parallel
More informationDense matrix algebra and libraries (and dealing with Fortran)
Dense matrix algebra and libraries (and dealing with Fortran) CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Dense matrix algebra and libraries (and dealing with Fortran)
More informationA Few Numerical Libraries for HPC
A Few Numerical Libraries for HPC CPS343 Parallel and High Performance Computing Spring 2016 CPS343 (Parallel and HPC) A Few Numerical Libraries for HPC Spring 2016 1 / 37 Outline 1 HPC == numerical linear
More informationSparse Matrix Libraries in C++ for High Performance. Architectures. ferent sparse matrix data formats in order to best
Sparse Matrix Libraries in C++ for High Performance Architectures Jack Dongarra xz, Andrew Lumsdaine, Xinhui Niu Roldan Pozo z, Karin Remington x x Oak Ridge National Laboratory z University oftennessee
More informationA parallel frontal solver for nite element applications
INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING Int. J. Numer. Meth. Engng 2001; 50:1131 1144 A parallel frontal solver for nite element applications Jennifer A. Scott ; Computational Science
More informationLINUX. Benchmark problems have been calculated with dierent cluster con- gurations. The results obtained from these experiments are compared to those
Parallel Computing on PC Clusters - An Alternative to Supercomputers for Industrial Applications Michael Eberl 1, Wolfgang Karl 1, Carsten Trinitis 1 and Andreas Blaszczyk 2 1 Technische Universitat Munchen
More informationPerformance Evaluation of BLAS on a Cluster of Multi-Core Intel Processors
Performance Evaluation of BLAS on a Cluster of Multi-Core Intel Processors Mostafa I. Soliman and Fatma S. Ahmed Computers and Systems Section, Electrical Engineering Department Aswan Faculty of Engineering,
More informationAdaptive Nonlinear Discriminant Analysis. by Regularized Minimum Squared Errors
Adaptive Nonlinear Discriminant Analysis by Regularized Minimum Squared Errors Hyunsoo Kim, Barry L Drake, and Haesun Park February 23, 2005 Abstract: Recently, kernelized nonlinear extensions of Fisher
More informationJournal of Engineering Research and Studies E-ISSN
Journal of Engineering Research and Studies E-ISS 0976-79 Research Article SPECTRAL SOLUTIO OF STEADY STATE CODUCTIO I ARBITRARY QUADRILATERAL DOMAIS Alavani Chitra R 1*, Joshi Pallavi A 1, S Pavitran
More informationFloating Point Fault Tolerance with Backward Error Assertions
Floating Point Fault Tolerance with Backward Error Assertions Daniel Boley Gene H. Golub * Samy Makar Nirmal Saxena Edward J. McCluskey Computer Science Dept. Computer Science Dept. Center for Reliable
More informationIIAIIIIA-II is called the condition number. Similarly, if x + 6x satisfies
SIAM J. ScI. STAT. COMPUT. Vol. 5, No. 2, June 1984 (C) 1984 Society for Industrial and Applied Mathematics OO6 CONDITION ESTIMATES* WILLIAM W. HAGERf Abstract. A new technique for estimating the 11 condition
More informationA class of communication-avoiding algorithms for solving general dense linear systems on CPU/GPU parallel machines
Available online at www.sciencedirect.com Procedia Computer Science 9 (2012 ) 17 26 International Conference on Computational Science, ICCS 2012 A class of communication-avoiding algorithms for solving
More informationBlocked Schur Algorithms for Computing the Matrix Square Root
Blocked Schur Algorithms for Computing the Matrix Square Root Edvin Deadman 1, Nicholas J. Higham 2,andRuiRalha 3 1 Numerical Algorithms Group edvin.deadman@nag.co.uk 2 University of Manchester higham@maths.manchester.ac.uk
More informationLAPACK. Linear Algebra PACKage. Janice Giudice David Knezevic 1
LAPACK Linear Algebra PACKage 1 Janice Giudice David Knezevic 1 Motivating Question Recalling from last week... Level 1 BLAS: vectors ops Level 2 BLAS: matrix-vectors ops 2 2 O( n ) flops on O( n ) data
More informationAlgebraic Iterative Methods for Computed Tomography
Algebraic Iterative Methods for Computed Tomography Per Christian Hansen DTU Compute Department of Applied Mathematics and Computer Science Technical University of Denmark Per Christian Hansen Algebraic
More informationModule 5.5: nag sym bnd lin sys Symmetric Banded Systems of Linear Equations. Contents
Module Contents Module 5.5: nag sym bnd lin sys Symmetric Banded Systems of nag sym bnd lin sys provides a procedure for solving real symmetric or complex Hermitian banded systems of linear equations with
More informationBMVC 1996 doi: /c.10.41
On the use of the 1D Boolean model for the description of binary textures M Petrou, M Arrigo and J A Vons Dept. of Electronic and Electrical Engineering, University of Surrey, Guildford GU2 5XH, United
More informationComparing SIMD and MIMD Programming Modes Ravikanth Ganesan, Kannan Govindarajan, and Min-You Wu Department of Computer Science State University of Ne
Comparing SIMD and MIMD Programming Modes Ravikanth Ganesan, Kannan Govindarajan, and Min-You Wu Department of Computer Science State University of New York Bualo, NY 14260 Abstract The Connection Machine
More informationInternational Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERA
International Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERATIONS KAM_IL SARAC, OMER E GEC_IO GLU, AMR EL ABBADI
More informationNetSolve: A Network Server. for Solving Computational Science Problems. November 27, Abstract
NetSolve: A Network Server for Solving Computational Science Problems Henri Casanova Jack Dongarra? y November 27, 1995 Abstract This paper presents a new system, called NetSolve, that allows users to
More informationA Fast and Exact Simulation Algorithm for General Gaussian Markov Random Fields
A Fast and Exact Simulation Algorithm for General Gaussian Markov Random Fields HÅVARD RUE DEPARTMENT OF MATHEMATICAL SCIENCES NTNU, NORWAY FIRST VERSION: FEBRUARY 23, 1999 REVISED: APRIL 23, 1999 SUMMARY
More informationSparse matrices, graphs, and tree elimination
Logistics Week 6: Friday, Oct 2 1. I will be out of town next Tuesday, October 6, and so will not have office hours on that day. I will be around on Monday, except during the SCAN seminar (1:25-2:15);
More informationSDLS: a Matlab package for solving conic least-squares problems
SDLS: a Matlab package for solving conic least-squares problems Didier Henrion, Jérôme Malick To cite this version: Didier Henrion, Jérôme Malick. SDLS: a Matlab package for solving conic least-squares
More informationParallel Numerical Algorithms
Parallel Numerical Algorithms Chapter 3 Dense Linear Systems Section 3.3 Triangular Linear Systems Michael T. Heath and Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign
More informationMOTION. Feature Matching/Tracking. Control Signal Generation REFERENCE IMAGE
Head-Eye Coordination: A Closed-Form Solution M. Xie School of Mechanical & Production Engineering Nanyang Technological University, Singapore 639798 Email: mmxie@ntuix.ntu.ac.sg ABSTRACT In this paper,
More informationAccelerating GPU kernels for dense linear algebra
Accelerating GPU kernels for dense linear algebra Rajib Nath, Stanimire Tomov, and Jack Dongarra Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville {rnath1, tomov,
More informationMATH 423 Linear Algebra II Lecture 17: Reduced row echelon form (continued). Determinant of a matrix.
MATH 423 Linear Algebra II Lecture 17: Reduced row echelon form (continued). Determinant of a matrix. Row echelon form A matrix is said to be in the row echelon form if the leading entries shift to the
More informationConvex Optimization / Homework 2, due Oct 3
Convex Optimization 0-725/36-725 Homework 2, due Oct 3 Instructions: You must complete Problems 3 and either Problem 4 or Problem 5 (your choice between the two) When you submit the homework, upload a
More informationNAG Fortran Library Routine Document F08BHF (DTZRZF).1
NAG Fortran Library Routine Document Note: before using this routine, please read the Users Note for your implementation to check the interpretation of bold italicised terms and other implementation-dependent
More informationParallelization of the QR Decomposition with Column Pivoting Using Column Cyclic Distribution on Multicore and GPU Processors
Parallelization of the QR Decomposition with Column Pivoting Using Column Cyclic Distribution on Multicore and GPU Processors Andrés Tomás 1, Zhaojun Bai 1, and Vicente Hernández 2 1 Department of Computer
More informationAbstract. 1 Introduction
The performance of fast Givens rotations problem implemented with MPI extensions in multicomputers L. Fernández and J. M. García Department of Informática y Sistemas, Universidad de Murcia, Campus de Espinardo
More informationTable-Lookup Approach for Compiling Two-Level Data-Processor Mappings in HPF Kuei-Ping Shih y, Jang-Ping Sheu y, and Chua-Huang Huang z y Department o
Table-Lookup Approach for Compiling Two-Level Data-Processor Mappings in HPF Kuei-Ping Shih y, Jang-Ping Sheu y, and Chua-Huang Huang z y Department of Computer Science and Information Engineering National
More informationThe LINPACK Benchmark on the Fujitsu AP 1000
The LINPACK Benchmark on the Fujitsu AP 1000 Richard P. Brent Computer Sciences Laboratory Australian National University Canberra, Australia Abstract We describe an implementation of the LINPACK Benchmark
More informationExtra-High Speed Matrix Multiplication on the Cray-2. David H. Bailey. September 2, 1987
Extra-High Speed Matrix Multiplication on the Cray-2 David H. Bailey September 2, 1987 Ref: SIAM J. on Scientic and Statistical Computing, vol. 9, no. 3, (May 1988), pg. 603{607 Abstract The Cray-2 is
More informationMatrices. Chapter Matrix A Mathematical Definition Matrix Dimensions and Notation
Chapter 7 Introduction to Matrices This chapter introduces the theory and application of matrices. It is divided into two main sections. Section 7.1 discusses some of the basic properties and operations
More informationreasonable to store in a software implementation, it is likely to be a signicant burden in a low-cost hardware implementation. We describe in this pap
Storage-Ecient Finite Field Basis Conversion Burton S. Kaliski Jr. 1 and Yiqun Lisa Yin 2 RSA Laboratories 1 20 Crosby Drive, Bedford, MA 01730. burt@rsa.com 2 2955 Campus Drive, San Mateo, CA 94402. yiqun@rsa.com
More informationy(b)-- Y[a,b]y(a). EQUATIONS ON AN INTEL HYPERCUBE*
SIAM J. ScI. STAT. COMPUT. Vol. 12, No. 6, pp. 1480-1485, November 1991 ()1991 Society for Industrial and Applied Mathematics 015 SOLUTION OF LINEAR SYSTEMS OF ORDINARY DIFFERENTIAL EQUATIONS ON AN INTEL
More informationWei Shu and Min-You Wu. Abstract. partitioning patterns, and communication optimization to achieve a speedup.
Sparse Implementation of Revised Simplex Algorithms on Parallel Computers Wei Shu and Min-You Wu Abstract Parallelizing sparse simplex algorithms is one of the most challenging problems. Because of very
More informationPerformance Evaluation of a New Parallel Preconditioner
Performance Evaluation of a New Parallel Preconditioner Keith D. Gremban Gary L. Miller Marco Zagha School of Computer Science Carnegie Mellon University 5 Forbes Avenue Pittsburgh PA 15213 Abstract The
More informationJ.A.J.Hall, K.I.M.McKinnon. September 1996
PARSMI, a parallel revised simplex algorithm incorporating minor iterations and Devex pricing J.A.J.Hall, K.I.M.McKinnon September 1996 MS 96-012 Supported by EPSRC research grant GR/J0842 Presented at
More informationAll use is subject to licence, see For any commercial application, a separate licence must be signed.
HS PAKAGE SPEIFIATION HS 2007 1 SUMMARY This routine uses the Generalized Minimal Residual method with restarts every m iterations, GMRES(m), to solve the n n unsymmetric linear system Ax = b, optionally
More informationSparse Linear Systems
1 Sparse Linear Systems Rob H. Bisseling Mathematical Institute, Utrecht University Course Introduction Scientific Computing February 22, 2018 2 Outline Iterative solution methods 3 A perfect bipartite
More informationLab # 2 - ACS I Part I - DATA COMPRESSION in IMAGE PROCESSING using SVD
Lab # 2 - ACS I Part I - DATA COMPRESSION in IMAGE PROCESSING using SVD Goals. The goal of the first part of this lab is to demonstrate how the SVD can be used to remove redundancies in data; in this example
More informationAnalysis of the GCR method with mixed precision arithmetic using QuPAT
Analysis of the GCR method with mixed precision arithmetic using QuPAT Tsubasa Saito a,, Emiko Ishiwata b, Hidehiko Hasegawa c a Graduate School of Science, Tokyo University of Science, 1-3 Kagurazaka,
More informationProject Report. 1 Abstract. 2 Algorithms. 2.1 Gaussian elimination without partial pivoting. 2.2 Gaussian elimination with partial pivoting
Project Report Bernardo A. Gonzalez Torres beaugonz@ucsc.edu Abstract The final term project consist of two parts: a Fortran implementation of a linear algebra solver and a Python implementation of a run
More informationHowever, m pq is just an approximation of M pq. As it was pointed out by Lin [2], more precise approximation can be obtained by exact integration of t
FAST CALCULATION OF GEOMETRIC MOMENTS OF BINARY IMAGES Jan Flusser Institute of Information Theory and Automation Academy of Sciences of the Czech Republic Pod vodarenskou vez 4, 82 08 Prague 8, Czech
More informationNetwork. Department of Statistics. University of California, Berkeley. January, Abstract
Parallelizing CART Using a Workstation Network Phil Spector Leo Breiman Department of Statistics University of California, Berkeley January, 1995 Abstract The CART (Classication and Regression Trees) program,
More informationProblem Set 2 Geometry, Algebra, Reality
Problem Set 2 Geometry, Algebra, Reality Applied Mathematics 121 Spring 2011 Due 5:00 PM, Friday, February 11, 2011 Announcements The assignment is due by 5:00 PM, Friday, February 11, 2011. Readings:
More informationColumn-Action Methods in Image Reconstruction
Column-Action Methods in Image Reconstruction Per Christian Hansen joint work with Tommy Elfving Touraj Nikazad Overview of Talk Part 1: the classical row-action method = ART The advantage of algebraic
More informationCS 770G - Parallel Algorithms in Scientific Computing
CS 770G - Parallel lgorithms in Scientific Computing Dense Matrix Computation II: Solving inear Systems May 28, 2001 ecture 6 References Introduction to Parallel Computing Kumar, Grama, Gupta, Karypis,
More informationSection 3.1 Gaussian Elimination Method (GEM) Key terms
Section 3.1 Gaussian Elimination Method (GEM) Key terms Rectangular systems Consistent system & Inconsistent systems Rank Types of solution sets RREF Upper triangular form & back substitution Nonsingular
More informationNAG Library Function Document nag_zgelsy (f08bnc)
NAG Library Function Document nag_zgelsy () 1 Purpose nag_zgelsy () computes the minimum norm solution to a complex linear least squares problem minkb Axk 2 x using a complete orthogonal factorization
More informationQR Decomposition on GPUs
QR Decomposition QR Algorithms Block Householder QR Andrew Kerr* 1 Dan Campbell 1 Mark Richards 2 1 Georgia Tech Research Institute 2 School of Electrical and Computer Engineering Georgia Institute of
More informationEcient Cubic B-spline Image interpolation on a GPU. 1 Abstract. 2 Introduction. F. Champagnat and Y. Le Sant. September 1, 2011
Ecient Cubic B-spline Image interpolation on a GPU F. Champagnat and Y. Le Sant September 1, 2011 1 Abstract Application of geometric transformation to images requires an interpolation step. When applied
More informationAim. Structure and matrix sparsity: Part 1 The simplex method: Exploiting sparsity. Structure and matrix sparsity: Overview
Aim Structure and matrix sparsity: Part 1 The simplex method: Exploiting sparsity Julian Hall School of Mathematics University of Edinburgh jajhall@ed.ac.uk What should a 2-hour PhD lecture on structure
More informationThe Encoding Complexity of Network Coding
The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network
More informationParallel Auction Algorithm for Linear Assignment Problem
Parallel Auction Algorithm for Linear Assignment Problem Xin Jin 1 Introduction The (linear) assignment problem is one of classic combinatorial optimization problems, first appearing in the studies on
More informationB(FOM) 2. Block full orthogonalization methods for functions of matrices. Kathryn Lund. December 12, 2017
B(FOM) 2 Block full orthogonalization methods for functions of matrices Kathryn Lund December 12, 2017 The block full orthogonalization methods for functions of matrices (denoted B(FOM) 2, for short) are
More informationNumerical Algorithms
Chapter 10 Slide 464 Numerical Algorithms Slide 465 Numerical Algorithms In textbook do: Matrix multiplication Solving a system of linear equations Slide 466 Matrices A Review An n m matrix Column a 0,0
More informationChapter 2 Basic Structure of High-Dimensional Spaces
Chapter 2 Basic Structure of High-Dimensional Spaces Data is naturally represented geometrically by associating each record with a point in the space spanned by the attributes. This idea, although simple,
More information