Technical note: A successive over-relaxation pre-conditioner to solve mixed model equations for genetic evaluation 1
|
|
- Kristin Lamb
- 5 years ago
- Views:
Transcription
1 Running head: Technical note: A successive over-relaxation pre-conditioner to solve mixed model equations for genetic evaluation 1 Karin Meyer 2 Animal Genetics and Breeding Unit 3, University of New England, Armidale NSW 2351, Australia ABSTRACT: A computationally efficient preconditioned conjugate gradient algorithm with a symmetric successive over-relaxation (SSOR) preconditioner for the iterative solution of set mixed model equations is described. Potential computational saving of this approach are examined for an example of single-step genomic evaluation of Australian sheep. Results show that the SSOR preconditioner can substantially reduce the number of iterates required for solutions to converge compared to simpler preconditioners with marked reductions in overall computing time Keywords: genetic evaluation, preconditioned conjugate gradient algorithm, SSOR preconditioner, computational requirements INTRODUCTION Estimation of breeding values in genetic evaluation schemes for livestock requires solving a set of mixed model equations (MME). Often, the number of equations is too large for solutions to be obtained directly, i.e. by inversion or triangular decomposition of the coefficient matrix in the MME, and iterative methods need to be employed. Early applications tended to rely on Gauss-Seidel type solution schemes, often with over-relaxation to improve convergence rates. Over the last two decades, however, the preconditioned conjugate gradient (PCG) algorithm has become the standard method to solve large systems of MME in animal breeding applications (e.g. Strandén and Lidauer, 1999; Tsuruta et al., 2001). Convergence rates of the PCG algorithm depend on the distribution of eigenvalues of the coefficient matrix in the MME. This can be improved, i.e. their spread and the condition number of the matrix can be reduced, by adequate choice of a preconditioning matrix. Loosely speaking, the closer this matrix resembles the coefficient matrix, the less iterations are required. In practice, however, this needs to be balanced with computational requirements both to set up 1 This work was supported by Meat and Livestock Australia under grants B.BFG.0050, B.SGN.0027 and B.SGN I am indebted to A. Swan for making the example data used available. 2 Corresponding author: kmeyer@une.edu.au 3 A joint venture with the NSW Department of Primary Industries
2 and store the preconditioner and to apply it for each iterate and many schemes thus rely on simple, diagonal or block-diagonal matrices (Strandén et al., 2002). This paper describes an alternative preconditioner which has the same structure as the coefficient matrix, together with an efficient implementation in a PCG algorithm. We demonstrate for an applied example that it can substantially reduce the number of iterations and overall computing time required. Let THE SSOR PRECONDITIONER Ax = b (1) denote the system of MME to be solved, with A the coefficient matrix, x the vector of unknowns and b the vector of right hand sides. Equivalently, solutions for x can be obtained by solving M 1 Ax = M 1 b (2) instead, with M denoting the preconditioner. Typically, the matrix M is chosen so that it and the product M 1 r are easy to calculate (with r denoting a vector) whilst reducing the condition number of M 1 A (Benzi, 2002). Saad (2003; Chapter 4) showed that an iterative solution scheme with over-relaxation is equivalent to fixed point iteration on a pre-conditioned system. Decompose the coefficient matrix A into A = L + D + L (3) with L = { a i j } for j < i the matrix comprising the strictly lower triangle of A and D = Diag(A) = { a ii } the matrix of diagonal elements. The preconditioner corresponding to a sym- metric successive over-relaxation (SSOR) scheme is M SSOR = 1 ω(2 ω) (D + ωl) D 1 ( D + ωl ) (4) with 0 < ω < 2 (Saad, 2003). Matrix M SSOR has the same structure (i.e. number and position of non-zero elements) as A, and can be used as preconditioner with a PCG algorithm. Saad (2003) stated that the choice of ω for a preconditioner is not important and recommended a value of ω = 1. Each PCG iterate then requires evaluating the product of the inverse of M and a vector, M 1 SSOR r = h. Fortunately, this can be obtained without even setting up M SSOR explicitly by exploiting the factorisation in (4) into upper and lower triangular matrices. Moreover, as D + L in M SSOR (for ω = 1) is simply the lower triangle of A, there is no computational overhead to 2
3 set up M or additional memory required to store it. We refer to the process of obtaining direct solutions for a system of equations with lower or upper triangular coefficient matrix as a triangular solve. Solving M SSOR h = r for h then involves 3 steps: i) Solve (D + ωl) t 1 = r using a forward triangular solve which gives t 1 = D 1 ( D + ωl ) h. ii) Calculate t 2 = Dt 1, and iii) apply a second, backward triangular solve to ( D + ωl ) h = t 2 for h. However, whilst easy to implement this implies a substantial number of multiplications which increase calculations required per iterate in a standard PCG algorithm to at least twice those needed for a diagonal preconditioner, i.e. M = D. Improved SSOR Improved versions of a PCG algorithm using a SSOR preconditioner have been described by Han and Zhang (2011) and Li et al. (2013). These reduce computations per iterate substantially by recognizing that calculation of the two expensive matrix vector products, namely i) of the coefficient matrix and a vector of directions, Ad, and ii) of the inverse of the preconditioning matrix and a vector, M 1 SSORr can be replaced. Instead, only the solutions for two triangular 63 systems of equations are required We adopt the procedure of Li et al. (2013) which differs from that of Han and Zhang (2011) by an initial transformation of the system of equations. For D = Diag{a ii }, define a transformed system of equations D 1/2 AD 1/2 D 1/2 x = D 1/2 b (5) 68 or A x = b (6) 69 Decomposing A as above gives A = L + L + I (7) and with M = ω (L + 1ω ) ( 2 ω I L + 1 ) ω I = ω 2 ω W W (8) A = W + W λi (9) 72 and λ = (2 ω)/ω. 3
4 73 74 Defining auxiliary vectors y = W 1 r and z = W T r then gives the following PCG algorithm (adapted from Li et al., 2013): For a given vector of starting values for the unknowns, x 0, initialize y 0 W 1( A x 0 b ) z 0 λy 0 d 0 W T z For the k th iterate, compute updated solutions x k x k 1 + αd k 1 with α = λ y k 1 y k 1 d k 1(2z k 1 λd k 1 ) Check for convergence. If the chosen criterion has not been met, update work vectors y k y k 1 + α [ d k 1 + W 1 (z k 1 λd k 1 ) ] z k β z k 1 λ y k with β = y k y k y k 1 y k 1 d k W T z k At convergence, calculate solutions on the original scale as x = D 1/2 x The major computations per iterate are the products of W 1 or W T and a vector (in step 3). As W is triangular, these can again be obtained without inverting W using triangular solves. PCG algorithms in animal breeding applications commonly involve a step resetting the search direction at regular intervals (e.g. Tsuruta et al., 2001) to reduce potential problems arising from the accumulation of rounding errors. This can be achieved in the scheme above by replacing the update of work vectors (step 3) for selected iterates with the corresponding calculations from the set-up phase (step 1). APPLICATION We demonstrate the utility of the SSOR preconditioner for the example considered by Meyer et al. (2015). In brief, this comprised a set of 11 traits considered in genetic evaluation of Australian meat sheep (generously made available from the Meat and Livestock Australia s Sheep Genetics data base and the Cooperative Research Centre for Sheep Industry Innovation), with 5.28 million records on 1.77 million animals. Including parents without records 4
5 there were 1,995,755 animals of which 10,944 were genotyped for 48,599 single nucleotide polymorphisms. Genomic relationships were computed following Yang et al. (2010). The model of analysis included contemporary groups as fixed effects, animals additive genetic effects and genetic groups (93 levels) as random effects for all traits. The latter were fitted explicitly, assigning proportions of membership for each animal. Genetic groups and animals additive genetic effects were fitted assuming the same covariance matrix. In addition, dams permanent environmental effects (653,068 levels) were fitted as random effects for 3 traits. This resulted in 24,161,124 equations in the mixed model. Analyses were carried out using either the pedigree based relationship matrix with 6,584,393 elements in its inverse, A 1, or combining pedigree and genomic information in a single-step model with 66,455,483 non-zero elements in the inverse of the combined relationship matrix, H 1 (half-stored). Furthermore, both the standard multivariate (MV) formulation and the equivalent model using a parameterisation to principal components (PC) (Meyer et al., 2015) were examined. Computing environment and strategy Mixed model equations were solved iteratively, using double precision computations in a preconditioned conjugate gradient algorithm, as implemented in the single-step module of our mixed model package WOMBAT (Meyer, 2007). Non-zero elements of the coefficient matrix (half-stored) in the MME were held in core using a combination of sparse matrix and dense storage. Dense diagonal blocks were assigned for genetic groups, considering all traits together, and genotyped animals. For the MV parameterisation, the latter again used one large block effects for all traits. Ordering equations for genotyped animals within PC, 11 separate diagonal blocks (of size equal to the number of genotyped animals) were used for the PC model, which substantially reduced memory required. Sparse matrix storage for the remaining parts held diagonal elements in core and used compressed sparse row format otherwise. Preconditioning schemes compared were a simple diagonal preconditioner (DIAG), a blockdiagonal preconditioner (BLOX) and the improved SSOR scheme described above, using ω = 1. Solutions were deemed to have converged when α d d/x x dropped below Computations were implemented using single- and multi-threaded versions of routines from the BLAS and sparse BLAS (Blackford et al., 2002) and LAPACK libraries (Anderson et al., 1999). Specifically, each PCG iterate for DIAG and BLOX required the product of the coefficient matrix and a vector, which was formed using routines DSYMV and MKL_DCSRSYMV. Multiple vector inner products (a.k.a. dot products ) needed were evaluated using function DDOT. Triangular solves for SSOR employed routines DAXPY and DAXPYI and functions DDOT and DDOTI to parallelize computations within rows or columns. Use of routines DTRSV and MKL_DCSRTRSV for the latter was disregarded, as it appeared to increase computing time required. For BLOX, the inverse of diagonal blocks were used as preconditioners, except for genotyped animals for 5
6 MV analyses. This comprised a block for all genetic groups effects, separate diagonal blocks for genotyped animals (of dimension 10,944) for each PC, and diagonal blocks equal to the number of traits for which they were fitted for the remaining random effects levels. For MV analyses, a Cholesky decomposition of the dense diagonal block for all genotyped animals and traits was carried out and used as a preconditioner in a triangular solve. These calculations were performed using LAPACK routines DPOTRF, DPOTRI and DPOTRS and BLAS routine DSYMV. Computations were carried under Linux on a shared machine with 512GB of RAM and 28 Intel Xeon CPU E cores (Intel Corporation, Santa Clara, CA), rated at 2.6Ghz, with a cache size of 35MB. BLAS and LAPACK routines used were loaded from the Intel Math Kernel Library (Intel Corporation, Santa Clara, CA). RESULTS Numbers of iterates and computing time required for each of the 24 analyses are summarized in Table 1. Results clearly show the impact of the preconditioner used on the number of iterates required. Results for BLOX differ somewhat from those reported by Meyer et al. (2015) due to a slightly less stringent convergence criterion used compared to the earlier study and tweaks in the implementation since. Correlations between solutions for corresponding analyses were at least For the standard multivariate parameterization, BLOX dramatically reduced the number of iterates required compared to DIAG. This was less pronounced for the PC model, suggesting that de-correlating effects in the transformation to PC scale already achieved a considerable part of the benefits otherwise obtained by considering all traits simultaneously when using BLOX. The SSOR preconditioner further reduced the number of iterates required in all cases, on average to about a third of the corresponding number for DIAG. Results for computing times required did not quite match the differences in numbers of iterates needed. In the main, this reflected differences in efficiency of implementation and, for multi-threaded analyses, in scope for parallelization. While Li et al. (2013) emphasized that computations required per iterate for the improved SSOR PCG algorithm would be very similar to a standard (non-preconditioned) conjugate gradient scheme, times per iterate for our example were higher throughout for SSOR than for DIAG. Nevertheless, for single processor analyses, the overall execution time was reduced by 35% to 60%. Parameterising to principal component yielded comparable reductions in execution time for all preconditioners. With 28 processors available, a different pattern emerged for multi-threaded analyses. Each iterate of DIAG and BLOX involved the product of the coefficient matrix and a vector which was amenable to parallel processing (implemented through highly optimized library routines). In contrast, SSOR required two triangular solves which were carried out in order and thus offered far less opportunity for parallel execution. Similarly, times for BLOX were substantially 6
7 higher than for DIAG for the MV single-step model. This was due to the size of the dense diagonal block (120,384) and the fact that using its Cholesky factor (rather than the inverse) as the preconditioner again involved a triangular solve. DISCUSSION A SSOR preconditioner for a PCG algorithm appears not to have been considered previously in the context of genetic evaluation. Applications for engineering problems have been described, for instance, by Chen et al. (2002), Han and Zhang (2011), Li et al. (2013) and Meng et al. (2016), with favourable reports on convergence rates achieved. Results demonstrate that, using the improved version of Li et al. (2013), it can substantially reduce the computational requirements for iterative solution of mixed model equations. Moreover, it requires about the same memory as the diagonal preconditioner and is easier to implement than a block-diagonal scheme. The SSOR PCG scheme described is most useful for applications where the MME can be held in core or where selected parts of the coefficient matrix can be read strategically from outof-core memory. With large amounts of RAM available for modern hardware, this is feasible for many genetic evaluation schemes for small to moderately large populations. The SSOR preconditioner can also reduce computational requirements for maximum likelihood analyses to estimate variance components which employ Monte Carlo techniques and require multiple solutions of the MME for each iterate, e.g. Matilainen et al. (2012). The scheme is likely to be less beneficial for extremely large applications using an iteration on data type strategy, as multiple passes through the data would be needed in each iterate which may quickly erode the computational savings afforded by the SSOR preconditioner per se However, in the era of multi-threaded and highly parallel computing, the choice of algorithm needs to consider the hardware constellation targeted. Using all processors available, there was little advantage of SSOR over DIAG in terms of overall execution time. Competitive performance of the diagonal preconditioner for parallel computing applications has been reported elsewhere (Pini and Gambolati, 1990). As outlined above, this was due to triangular solves required in each iterate which limited the scope for parallel processing to operations within each row or column. Studies examining PCG algorithms for massively parallel processors have thus taken a different route, suggesting to approximate M 1 SSOR to avoid the need for ordered, triangu- 194 lar solution schemes albeit at the expense of substantial additional memory requirements (e.g Helfenstein and Koko, 2012). Alternative proposals to boost parallel performance of triangular solves range from identification of independent levels in the triangular matrix and appropriate scheduling (Mayer, 2009) to use of iterative schemes (Anzt et al., 2015). 7
8 IMPLICATIONS The improved preconditioned gradient algorithm with SSOR preconditioner described can substantially reduce computing times for iterative solution of mixed model equations in quantitative genetic applications. It is suitable as a drop-in replacement for existing methods for schemes setting up the mixed model equations explicitly and most advantageous for computing environments with moderate amounts of parallelization. LITERATURE CITED Anderson, E., Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen, LAPACK Users Guide. Society for Industrial and Applied Mathematics, Philadelphia, PA, Third edn. ISBN Anzt, H., E. Chow, and J. Dongarra, Iterative sparse triangular solves for preconditioning. In: J. L. Träff, S. Hunold, and F. Versaci, eds., Euro-Par 2015: Parallel Processing, vol of Theoretical Computer Science and General Issues. Springer. ISBN , Benzi, M., Preconditioning techniques for large linear systems: A survey. J. Comput. Phys. 182: doi: /jcph Blackford, L., J. Demmel, J. Dongarra, I. Duff, S. Hammarling, G. Henry, M. Heroux, L. Kaufman, A. Limsdaine, A. Petitet, R. Pozo, K. Remington, and R. C. Whaley, An updated set of Basic Linear Algebra Subprograms (BLAS). ACM Trans. Math. Softw. 28: doi: / Chen, R.-S., E. K.-N. Yung, C. H. Chan, D. X. Wang, and D. G. Fang, Application of the SSOR preconditioned CG algorithm to the vector FEM for 3D full-wave analysis of electromagnetic-field boundary-value problems. IEEE Trans. Microw. Theory Techn. 50: doi: / Han, L. and Z. Zhang, Application of SSOR-PCG method with improved iteration format in FEM simulation of massive concrete. Water Sci. Engineer. 4: doi: /j.issn Helfenstein, R. and J. Koko, Parallel preconditioned conjugate gradient algorithm on GPU. J. Comput. Appl. Math. 236: doi: /j.cam Li, G., C. Tang, and L. Li, High-efficiency improved symmetric successive overrelaxation preconditioned conjugate gradient method for solving large-scale finite element linear equations. Appl. Math. Mech. 34: doi: /s x. 8
9 Matilainen, K., E. Mäntysaari, M. Lidauer, I. Strandén, and R. Thompson, Employing a Monte Carlo algorithm in expectation maximization restricted maximum likelihood estimation of the linear mixed model. J. Anim. Breed. Genet. 129: doi: /j x. Mayer, J., Parallel algorithms for solving linear systems with sparse triangular matrices. Computing 86: doi: /s Meng, Z., F. Li, X. Xu, D. Huang, and D. Zhang, Fast inversion of gravity data using the symmetric successive over-relaxation (SSOR) preconditioned conjugate gradient algorithm. Explor. Geophys. 00:Published online 16 February doi: /EG Meyer, K., WOMBAT a tool for mixed model analyses in quantitative genetics by REML. J. Zhejiang Univ. SCIENCE B 8: doi: /jzus.2007.b0815. Meyer, K., A. Swan, and B. Tier, Technical note: Genetic principal component models for multi-trait single-step genomic evaluation. J. Anim. Sci. 93: doi: /jas Pini, G. and G. Gambolati, Is a simple diagonal scaling the best preconditioner for conjugate gradients on supercomputers? Adv. Water Resour. 13: doi: / (90)90006-P. Saad, Y., Iterative Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2nd edn. ISBN Strandén, I. and M. Lidauer, Solving large mixed linear models using preconditioned conjugate gradient iteration. J. Dairy Sci. 82: doi: /jds.S (99) Strandén, I., S. Tsuruta, and I. Misztal, Simple preconditioners for the conjugate gradient method: experience with test day models. J. Anim. Breed. Genet. 119: doi: /j x. Tsuruta, S., I. Misztal, and I. Strandén, Use of the preconditioned conjugate gradient algorithm as a generic solver for mixed-model equations in animal breeding applications. J. Anim. Sci. 79: doi: / x. Yang, J., B. Benyamin, B. P. McEvoy, S. Gordon, A. K. Henders, D. R. Nyholt, P. A. Madden, A. C. Heath, N. G. Martin, G. W. Montgomery, M. E. Goddard, and P. M. Visscher, Common SNPs explain a large proportion of the heritability for human height. Nature Genet. 42: doi: /ng
10 Table 1: Characteristics of the mixed model equations and computing requirements for diagonal (DIAG), block-diagonal (BLOX) and symmetric successive overrelaxation (SSOR) preconditioning schemes for single- and multi-threaded computations Threads Rel. a Param. b NNZ c No. of iterates Elapsed time (h) DIAG BLOX SSOR DIAG BLOX SSOR Single A 1 MV 918 5,550 1,999 1, PC 1,377 2,784 2,109 1, H 1 MV 8,162 6,691 2,910 1, PC 2,035 3,921 2,984 1, Multi A 1 MV 918 5,521 1,991 1, PC 1,377 2,803 2,107 1, H 1 MV 8,162 6,681 2,889 1, PC 2,035 3,929 2,965 1, a Relationship matrix: A 1 pedigree, H 1 single step b Parameterization: MV standard multivariate, PC principal components c No. of non-zero elements in one triangle of coefficient matrix; in million 10
AMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 20: Sparse Linear Systems; Direct Methods vs. Iterative Methods Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 26
More informationFrequency Scaling and Energy Efficiency regarding the Gauss-Jordan Elimination Scheme on OpenPower 8
Frequency Scaling and Energy Efficiency regarding the Gauss-Jordan Elimination Scheme on OpenPower 8 Martin Köhler Jens Saak 2 The Gauss-Jordan Elimination scheme is an alternative to the LU decomposition
More informationStudy and implementation of computational methods for Differential Equations in heterogeneous systems. Asimina Vouronikoy - Eleni Zisiou
Study and implementation of computational methods for Differential Equations in heterogeneous systems Asimina Vouronikoy - Eleni Zisiou Outline Introduction Review of related work Cyclic Reduction Algorithm
More informationEstimating Variance Components in MMAP
Last update: 6/1/2014 Estimating Variance Components in MMAP MMAP implements routines to estimate variance components within the mixed model. These estimates can be used for likelihood ratio tests to compare
More informationHYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER PROF. BRYANT PROF. KAYVON 15618: PARALLEL COMPUTER ARCHITECTURE
HYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER AVISHA DHISLE PRERIT RODNEY ADHISLE PRODNEY 15618: PARALLEL COMPUTER ARCHITECTURE PROF. BRYANT PROF. KAYVON LET S
More informationParallel solution for finite element linear systems of. equations on workstation cluster *
Aug. 2009, Volume 6, No.8 (Serial No.57) Journal of Communication and Computer, ISSN 1548-7709, USA Parallel solution for finite element linear systems of equations on workstation cluster * FU Chao-jiang
More informationGTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS. Kyle Spagnoli. Research EM Photonics 3/20/2013
GTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS Kyle Spagnoli Research Engineer @ EM Photonics 3/20/2013 INTRODUCTION» Sparse systems» Iterative solvers» High level benchmarks»
More informationSupercomputing and Science An Introduction to High Performance Computing
Supercomputing and Science An Introduction to High Performance Computing Part VII: Scientific Computing Henry Neeman, Director OU Supercomputing Center for Education & Research Outline Scientific Computing
More informationAlgorithm 8xx: SuiteSparseQR, a multifrontal multithreaded sparse QR factorization package
Algorithm 8xx: SuiteSparseQR, a multifrontal multithreaded sparse QR factorization package TIMOTHY A. DAVIS University of Florida SuiteSparseQR is an implementation of the multifrontal sparse QR factorization
More informationEFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI
EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI 1 Akshay N. Panajwar, 2 Prof.M.A.Shah Department of Computer Science and Engineering, Walchand College of Engineering,
More information3D Helmholtz Krylov Solver Preconditioned by a Shifted Laplace Multigrid Method on Multi-GPUs
3D Helmholtz Krylov Solver Preconditioned by a Shifted Laplace Multigrid Method on Multi-GPUs H. Knibbe, C. W. Oosterlee, C. Vuik Abstract We are focusing on an iterative solver for the three-dimensional
More informationReport of Linear Solver Implementation on GPU
Report of Linear Solver Implementation on GPU XIANG LI Abstract As the development of technology and the linear equation solver is used in many aspects such as smart grid, aviation and chemical engineering,
More informationA Square Block Format for Symmetric Band Matrices
A Square Block Format for Symmetric Band Matrices Fred G. Gustavson 1, José R. Herrero 2, E. Morancho 2 1 IBM T.J. Watson Research Center, Emeritus, and Umeå University fg2935@hotmail.com 2 Computer Architecture
More informationEpetra Performance Optimization Guide
SAND2005-1668 Unlimited elease Printed March 2005 Updated for Trilinos 9.0 February 2009 Epetra Performance Optimization Guide Michael A. Heroux Scalable Algorithms Department Sandia National Laboratories
More informationPerformance Analysis of BLAS Libraries in SuperLU_DIST for SuperLU_MCDT (Multi Core Distributed) Development
Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Performance Analysis of BLAS Libraries in SuperLU_DIST for SuperLU_MCDT (Multi Core Distributed) Development M. Serdar Celebi
More informationEfficient Second-Order Iterative Methods for IR Drop Analysis in Power Grid
Efficient Second-Order Iterative Methods for IR Drop Analysis in Power Grid Yu Zhong Martin D. F. Wong Dept. of Electrical and Computer Engineering Dept. of Electrical and Computer Engineering Univ. of
More informationAbstract Primal dual interior point methods and the HKM method in particular
Mathematical Programming manuscript No. (will be inserted by the editor) Brian Borchers Joseph Young How Far Can We Go With Primal Dual Interior Point Methods for SDP? Received: date / Accepted: date Abstract
More informationTechniques for Optimizing FEM/MoM Codes
Techniques for Optimizing FEM/MoM Codes Y. Ji, T. H. Hubing, and H. Wang Electromagnetic Compatibility Laboratory Department of Electrical & Computer Engineering University of Missouri-Rolla Rolla, MO
More informationContents. I The Basic Framework for Stationary Problems 1
page v Preface xiii I The Basic Framework for Stationary Problems 1 1 Some model PDEs 3 1.1 Laplace s equation; elliptic BVPs... 3 1.1.1 Physical experiments modeled by Laplace s equation... 5 1.2 Other
More informationHigh performance matrix inversion of SPD matrices on graphics processors
High performance matrix inversion of SPD matrices on graphics processors Peter Benner, Pablo Ezzatti, Enrique S. Quintana-Ortí and Alfredo Remón Max-Planck-Institute for Dynamics of Complex Technical Systems
More informationAccelerating the Conjugate Gradient Algorithm with GPUs in CFD Simulations
Accelerating the Conjugate Gradient Algorithm with GPUs in CFD Simulations Hartwig Anzt 1, Marc Baboulin 2, Jack Dongarra 1, Yvan Fournier 3, Frank Hulsemann 3, Amal Khabou 2, and Yushan Wang 2 1 University
More informationFigure 6.1: Truss topology optimization diagram.
6 Implementation 6.1 Outline This chapter shows the implementation details to optimize the truss, obtained in the ground structure approach, according to the formulation presented in previous chapters.
More informationBlocked Schur Algorithms for Computing the Matrix Square Root. Deadman, Edvin and Higham, Nicholas J. and Ralha, Rui. MIMS EPrint: 2012.
Blocked Schur Algorithms for Computing the Matrix Square Root Deadman, Edvin and Higham, Nicholas J. and Ralha, Rui 2013 MIMS EPrint: 2012.26 Manchester Institute for Mathematical Sciences School of Mathematics
More informationVIII/2015 TECHNICAL REFERENCE GUIDE FOR
MiX99 Solving Large Mixed Model Equations Release VIII/2015 TECHNICAL REFERENCE GUIDE FOR MiX99 SOLVER Copyright 2015 Last update: Aug 2015 Preface Development of MiX99 was initiated to allow analysis
More informationPerformance Evaluation of a New Parallel Preconditioner
Performance Evaluation of a New Parallel Preconditioner Keith D. Gremban Gary L. Miller Marco Zagha School of Computer Science Carnegie Mellon University 5 Forbes Avenue Pittsburgh PA 15213 Abstract The
More informationStrategies for Parallelizing the Solution of Rational Matrix Equations
Strategies for Parallelizing the Solution of Rational Matrix Equations José M. Badía 1, Peter Benner, Maribel Castillo 1, Heike Faßbender 3, Rafael Mayo 1, Enrique S. Quintana-Ortí 1, and Gregorio Quintana-Ortí
More informationSome notes on efficient computing and high performance computing environments
Some notes on efficient computing and high performance computing environments Abhi Datta 1, Sudipto Banerjee 2 and Andrew O. Finley 3 July 31, 2017 1 Department of Biostatistics, Bloomberg School of Public
More informationOptimizing Data Locality for Iterative Matrix Solvers on CUDA
Optimizing Data Locality for Iterative Matrix Solvers on CUDA Raymond Flagg, Jason Monk, Yifeng Zhu PhD., Bruce Segee PhD. Department of Electrical and Computer Engineering, University of Maine, Orono,
More informationA Fast and Exact Simulation Algorithm for General Gaussian Markov Random Fields
A Fast and Exact Simulation Algorithm for General Gaussian Markov Random Fields HÅVARD RUE DEPARTMENT OF MATHEMATICAL SCIENCES NTNU, NORWAY FIRST VERSION: FEBRUARY 23, 1999 REVISED: APRIL 23, 1999 SUMMARY
More information1 2 (3 + x 3) x 2 = 1 3 (3 + x 1 2x 3 ) 1. 3 ( 1 x 2) (3 + x(0) 3 ) = 1 2 (3 + 0) = 3. 2 (3 + x(0) 1 2x (0) ( ) = 1 ( 1 x(0) 2 ) = 1 3 ) = 1 3
6 Iterative Solvers Lab Objective: Many real-world problems of the form Ax = b have tens of thousands of parameters Solving such systems with Gaussian elimination or matrix factorizations could require
More informationBrief notes on setting up semi-high performance computing environments. July 25, 2014
Brief notes on setting up semi-high performance computing environments July 25, 2014 1 We have two different computing environments for fitting demanding models to large space and/or time data sets. 1
More informationHigh Performance Dense Linear Algebra in Intel Math Kernel Library (Intel MKL)
High Performance Dense Linear Algebra in Intel Math Kernel Library (Intel MKL) Michael Chuvelev, Intel Corporation, michael.chuvelev@intel.com Sergey Kazakov, Intel Corporation sergey.kazakov@intel.com
More informationBLAS: Basic Linear Algebra Subroutines I
BLAS: Basic Linear Algebra Subroutines I Most numerical programs do similar operations 90% time is at 10% of the code If these 10% of the code is optimized, programs will be fast Frequently used subroutines
More informationBLAS: Basic Linear Algebra Subroutines I
BLAS: Basic Linear Algebra Subroutines I Most numerical programs do similar operations 90% time is at 10% of the code If these 10% of the code is optimized, programs will be fast Frequently used subroutines
More informationLecture 15: More Iterative Ideas
Lecture 15: More Iterative Ideas David Bindel 15 Mar 2010 Logistics HW 2 due! Some notes on HW 2. Where we are / where we re going More iterative ideas. Intro to HW 3. More HW 2 notes See solution code!
More informationSELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND
Student Submission for the 5 th OpenFOAM User Conference 2017, Wiesbaden - Germany: SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND TESSA UROIĆ Faculty of Mechanical Engineering and Naval Architecture, Ivana
More informationAccelerating the Hessian-free Gauss-Newton Full-waveform Inversion via Preconditioned Conjugate Gradient Method
Accelerating the Hessian-free Gauss-Newton Full-waveform Inversion via Preconditioned Conjugate Gradient Method Wenyong Pan 1, Kris Innanen 1 and Wenyuan Liao 2 1. CREWES Project, Department of Geoscience,
More informationApplication of LSQR to Calibration of a MODFLOW Model: A Synthetic Study
Application of LSQR to Calibration of a MODFLOW Model: A Synthetic Study Chris Muffels 1,2, Matthew Tonkin 2,3, Haijiang Zhang 1, Mary Anderson 1, Tom Clemo 4 1 University of Wisconsin-Madison, muffels@geology.wisc.edu,
More informationSparse LU Factorization for Parallel Circuit Simulation on GPUs
Department of Electronic Engineering, Tsinghua University Sparse LU Factorization for Parallel Circuit Simulation on GPUs Ling Ren, Xiaoming Chen, Yu Wang, Chenxi Zhang, Huazhong Yang Nano-scale Integrated
More informationImplementation of a Primal-Dual Method for. SDP on a Shared Memory Parallel Architecture
Implementation of a Primal-Dual Method for SDP on a Shared Memory Parallel Architecture Brian Borchers Joseph G. Young March 27, 2006 Abstract Primal dual interior point methods and the HKM method in particular
More informationIterative Sparse Triangular Solves for Preconditioning
Euro-Par 2015, Vienna Aug 24-28, 2015 Iterative Sparse Triangular Solves for Preconditioning Hartwig Anzt, Edmond Chow and Jack Dongarra Incomplete Factorization Preconditioning Incomplete LU factorizations
More informationNumerical Methods to Solve 2-D and 3-D Elliptic Partial Differential Equations Using Matlab on the Cluster maya
Numerical Methods to Solve 2-D and 3-D Elliptic Partial Differential Equations Using Matlab on the Cluster maya David Stonko, Samuel Khuvis, and Matthias K. Gobbert (gobbert@umbc.edu) Department of Mathematics
More information(Sparse) Linear Solvers
(Sparse) Linear Solvers Ax = B Why? Many geometry processing applications boil down to: solve one or more linear systems Parameterization Editing Reconstruction Fairing Morphing 2 Don t you just invert
More informationSpeedup Altair RADIOSS Solvers Using NVIDIA GPU
Innovation Intelligence Speedup Altair RADIOSS Solvers Using NVIDIA GPU Eric LEQUINIOU, HPC Director Hongwei Zhou, Senior Software Developer May 16, 2012 Innovation Intelligence ALTAIR OVERVIEW Altair
More informationPorting the NAS-NPB Conjugate Gradient Benchmark to CUDA. NVIDIA Corporation
Porting the NAS-NPB Conjugate Gradient Benchmark to CUDA NVIDIA Corporation Outline! Overview of CG benchmark! Overview of CUDA Libraries! CUSPARSE! CUBLAS! Porting Sequence! Algorithm Analysis! Data/Code
More informationMatrix-free IPM with GPU acceleration
Matrix-free IPM with GPU acceleration Julian Hall, Edmund Smith and Jacek Gondzio School of Mathematics University of Edinburgh jajhall@ed.ac.uk 29th June 2011 Linear programming theory Primal-dual pair
More informationChapter 14: Matrix Iterative Methods
Chapter 14: Matrix Iterative Methods 14.1INTRODUCTION AND OBJECTIVES This chapter discusses how to solve linear systems of equations using iterative methods and it may be skipped on a first reading of
More informationBlocked Schur Algorithms for Computing the Matrix Square Root
Blocked Schur Algorithms for Computing the Matrix Square Root Edvin Deadman 1, Nicholas J. Higham 2,andRuiRalha 3 1 Numerical Algorithms Group edvin.deadman@nag.co.uk 2 University of Manchester higham@maths.manchester.ac.uk
More informationIterative Sparse Triangular Solves for Preconditioning
Iterative Sparse Triangular Solves for Preconditioning Hartwig Anzt 1(B), Edmond Chow 2, and Jack Dongarra 1 1 University of Tennessee, Knoxville, TN, USA hanzt@icl.utk.edu, dongarra@eecs.utk.edu 2 Georgia
More informationParallel Numerical Algorithms
Parallel Numerical Algorithms Chapter 4 Sparse Linear Systems Section 4.3 Iterative Methods Michael T. Heath and Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign
More informationIntel Math Kernel Library (Intel MKL) BLAS. Victor Kostin Intel MKL Dense Solvers team manager
Intel Math Kernel Library (Intel MKL) BLAS Victor Kostin Intel MKL Dense Solvers team manager Intel MKL BLAS/Sparse BLAS Original ( dense ) BLAS available from www.netlib.org Additionally Intel MKL provides
More informationSparse Matrices. Mathematics In Science And Engineering Volume 99 READ ONLINE
Sparse Matrices. Mathematics In Science And Engineering Volume 99 READ ONLINE If you are looking for a ebook Sparse Matrices. Mathematics in Science and Engineering Volume 99 in pdf form, in that case
More informationS0432 NEW IDEAS FOR MASSIVELY PARALLEL PRECONDITIONERS
S0432 NEW IDEAS FOR MASSIVELY PARALLEL PRECONDITIONERS John R Appleyard Jeremy D Appleyard Polyhedron Software with acknowledgements to Mark A Wakefield Garf Bowen Schlumberger Outline of Talk Reservoir
More informationModule 5.5: nag sym bnd lin sys Symmetric Banded Systems of Linear Equations. Contents
Module Contents Module 5.5: nag sym bnd lin sys Symmetric Banded Systems of nag sym bnd lin sys provides a procedure for solving real symmetric or complex Hermitian banded systems of linear equations with
More informationA priori power estimation of linear solvers on multi-core processors
A priori power estimation of linear solvers on multi-core processors Dimitar Lukarski 1, Tobias Skoglund 2 Uppsala University Department of Information Technology Division of Scientific Computing 1 Division
More informationInclusion of Aleatory and Epistemic Uncertainty in Design Optimization
10 th World Congress on Structural and Multidisciplinary Optimization May 19-24, 2013, Orlando, Florida, USA Inclusion of Aleatory and Epistemic Uncertainty in Design Optimization Sirisha Rangavajhala
More informationStorage Formats for Sparse Matrices in Java
Storage Formats for Sparse Matrices in Java Mikel Luján, Anila Usman, Patrick Hardie, T.L. Freeman, and John R. Gurd Centre for Novel Computing, The University of Manchester, Oxford Road, Manchester M13
More informationPreconditioning for linear least-squares problems
Preconditioning for linear least-squares problems Miroslav Tůma Institute of Computer Science Academy of Sciences of the Czech Republic tuma@cs.cas.cz joint work with Rafael Bru, José Marín and José Mas
More information2nd Introduction to the Matrix package
2nd Introduction to the Matrix package Martin Maechler and Douglas Bates R Core Development Team maechler@stat.math.ethz.ch, bates@r-project.org September 2006 (typeset on October 7, 2007) Abstract Linear
More informationOutline. Parallel Algorithms for Linear Algebra. Number of Processors and Problem Size. Speedup and Efficiency
1 2 Parallel Algorithms for Linear Algebra Richard P. Brent Computer Sciences Laboratory Australian National University Outline Basic concepts Parallel architectures Practical design issues Programming
More informationOn Massively Parallel Algorithms to Track One Path of a Polynomial Homotopy
On Massively Parallel Algorithms to Track One Path of a Polynomial Homotopy Jan Verschelde joint with Genady Yoffe and Xiangcheng Yu University of Illinois at Chicago Department of Mathematics, Statistics,
More informationMulti-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation
Multi-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation 1 Cheng-Han Du* I-Hsin Chung** Weichung Wang* * I n s t i t u t e o f A p p l i e d M
More informationSparse Matrix Libraries in C++ for High Performance. Architectures. ferent sparse matrix data formats in order to best
Sparse Matrix Libraries in C++ for High Performance Architectures Jack Dongarra xz, Andrew Lumsdaine, Xinhui Niu Roldan Pozo z, Karin Remington x x Oak Ridge National Laboratory z University oftennessee
More informationIntel Math Kernel Library (Intel MKL) Sparse Solvers. Alexander Kalinkin Intel MKL developer, Victor Kostin Intel MKL Dense Solvers team manager
Intel Math Kernel Library (Intel MKL) Sparse Solvers Alexander Kalinkin Intel MKL developer, Victor Kostin Intel MKL Dense Solvers team manager Copyright 3, Intel Corporation. All rights reserved. Sparse
More informationImplicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation using OpenACC
Fourth Workshop on Accelerator Programming Using Directives (WACCPD), Nov. 13, 2017 Implicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation using OpenACC Takuma
More informationVIII/2015 TECHNICAL REFERENCE GUIDE FOR
MiX99 Solving Large Mixed Model Equations Release VIII/2015 TECHNICAL REFERENCE GUIDE FOR MiX99 PRE-PROCESSOR Copyright 2015 Last update: Aug 2015 Preface Development of MiX99 was initiated to allow more
More informationHow to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić
How to perform HPL on CPU&GPU clusters Dr.sc. Draško Tomić email: drasko.tomic@hp.com Forecasting is not so easy, HPL benchmarking could be even more difficult Agenda TOP500 GPU trends Some basics about
More informationNumerically Stable Real-Number Codes Based on Random Matrices
Numerically Stable eal-number Codes Based on andom Matrices Zizhong Chen Innovative Computing Laboratory Computer Science Department University of Tennessee zchen@csutkedu Abstract Error correction codes
More informationNAG Fortran Library Routine Document F04CAF.1
F04 Simultaneous Linear Equations NAG Fortran Library Routine Document Note: before using this routine, please read the Users Note for your implementation to check the interpretation of bold italicised
More informationA class of communication-avoiding algorithms for solving general dense linear systems on CPU/GPU parallel machines
Available online at www.sciencedirect.com Procedia Computer Science 9 (2012 ) 17 26 International Conference on Computational Science, ICCS 2012 A class of communication-avoiding algorithms for solving
More informationIntel Math Kernel Library 10.3
Intel Math Kernel Library 10.3 Product Brief Intel Math Kernel Library 10.3 The Flagship High Performance Computing Math Library for Windows*, Linux*, and Mac OS* X Intel Math Kernel Library (Intel MKL)
More informationThe Basic Linear Algebra Subprograms (BLAS) are an interface to commonly used fundamental linear algebra operations.
TITLE Basic Linear Algebra Subprograms BYLINE Robert A. van de Geijn Department of Computer Science The University of Texas at Austin Austin, TX USA rvdg@cs.utexas.edu Kazushige Goto Texas Advanced Computing
More informationContents. F10: Parallel Sparse Matrix Computations. Parallel algorithms for sparse systems Ax = b. Discretized domain a metal sheet
Contents 2 F10: Parallel Sparse Matrix Computations Figures mainly from Kumar et. al. Introduction to Parallel Computing, 1st ed Chap. 11 Bo Kågström et al (RG, EE, MR) 2011-05-10 Sparse matrices and storage
More informationDeveloping a High Performance Software Library with MPI and CUDA for Matrix Computations
Developing a High Performance Software Library with MPI and CUDA for Matrix Computations Bogdan Oancea 1, Tudorel Andrei 2 1 Nicolae Titulescu University of Bucharest, e-mail: bogdanoancea@univnt.ro, Calea
More information2 Fundamentals of Serial Linear Algebra
. Direct Solution of Linear Systems.. Gaussian Elimination.. LU Decomposition and FBS..3 Cholesky Decomposition..4 Multifrontal Methods. Iterative Solution of Linear Systems.. Jacobi Method Fundamentals
More informationSUMMARY. solve the matrix system using iterative solvers. We use the MUMPS codes and distribute the computation over many different
Forward Modelling and Inversion of Multi-Source TEM Data D. W. Oldenburg 1, E. Haber 2, and R. Shekhtman 1 1 University of British Columbia, Department of Earth & Ocean Sciences 2 Emory University, Atlanta,
More informationAsreml-R: an R package for mixed models using residual maximum likelihood
Asreml-R: an R package for mixed models using residual maximum likelihood David Butler 1 Brian Cullis 2 Arthur Gilmour 3 1 Queensland Department of Primary Industries Toowoomba 2 NSW Department of Primary
More informationNAG Library Chapter Introduction. F16 Further Linear Algebra Support Routines
NAG Library Chapter Introduction Contents 1 Scope of the Chapter.... 2 2 Background to the Problems... 2 3 Recommendations on Choice and Use of Available Routines... 2 3.1 Naming Scheme... 2 3.1.1 NAGnames...
More informationVery fast simulation of nonlinear water waves in very large numerical wave tanks on affordable graphics cards
Very fast simulation of nonlinear water waves in very large numerical wave tanks on affordable graphics cards By Allan P. Engsig-Karup, Morten Gorm Madsen and Stefan L. Glimberg DTU Informatics Workshop
More informationIterative Algorithms I: Elementary Iterative Methods and the Conjugate Gradient Algorithms
Iterative Algorithms I: Elementary Iterative Methods and the Conjugate Gradient Algorithms By:- Nitin Kamra Indian Institute of Technology, Delhi Advisor:- Prof. Ulrich Reude 1. Introduction to Linear
More informationBatched Factorization and Inversion Routines for Block-Jacobi Preconditioning on GPUs
Workshop on Batched, Reproducible, and Reduced Precision BLAS Atlanta, GA 02/25/2017 Batched Factorization and Inversion Routines for Block-Jacobi Preconditioning on GPUs Hartwig Anzt Joint work with Goran
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 5: Sparse Linear Systems and Factorization Methods Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 18 Sparse
More informationApproaches to Parallel Implementation of the BDDC Method
Approaches to Parallel Implementation of the BDDC Method Jakub Šístek Includes joint work with P. Burda, M. Čertíková, J. Mandel, J. Novotný, B. Sousedík. Institute of Mathematics of the AS CR, Prague
More informationGPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE)
GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE) NATALIA GIMELSHEIN ANSHUL GUPTA STEVE RENNICH SEID KORIC NVIDIA IBM NVIDIA NCSA WATSON SPARSE MATRIX PACKAGE (WSMP) Cholesky, LDL T, LU factorization
More informationAnalysis of the GCR method with mixed precision arithmetic using QuPAT
Analysis of the GCR method with mixed precision arithmetic using QuPAT Tsubasa Saito a,, Emiko Ishiwata b, Hidehiko Hasegawa c a Graduate School of Science, Tokyo University of Science, 1-3 Kagurazaka,
More informationPerformance Evaluation of a New Parallel Preconditioner
Performance Evaluation of a New Parallel Preconditioner Keith D. Gremban Gary L. Miller October 994 CMU-CS-94-25 Marco Zagha School of Computer Science Carnegie Mellon University Pittsburgh, PA 523 This
More informationParallelization of the QR Decomposition with Column Pivoting Using Column Cyclic Distribution on Multicore and GPU Processors
Parallelization of the QR Decomposition with Column Pivoting Using Column Cyclic Distribution on Multicore and GPU Processors Andrés Tomás 1, Zhaojun Bai 1, and Vicente Hernández 2 1 Department of Computer
More informationCHAO YANG. Early Experience on Optimizations of Application Codes on the Sunway TaihuLight Supercomputer
CHAO YANG Dr. Chao Yang is a full professor at the Laboratory of Parallel Software and Computational Sciences, Institute of Software, Chinese Academy Sciences. His research interests include numerical
More informationPreconditioning Linear Systems Arising from Graph Laplacians of Complex Networks
Preconditioning Linear Systems Arising from Graph Laplacians of Complex Networks Kevin Deweese 1 Erik Boman 2 1 Department of Computer Science University of California, Santa Barbara 2 Scalable Algorithms
More informationUsing BLUPF90 UGA
Using BLUPF90 UGA 05-2018 BLUPF90 family programs All programs are controled by the SAME paramenter file. Extra options could be used to set non-default behaviour of each program Understanding parameter
More informationQ. Wang National Key Laboratory of Antenna and Microwave Technology Xidian University No. 2 South Taiba Road, Xi an, Shaanxi , P. R.
Progress In Electromagnetics Research Letters, Vol. 9, 29 38, 2009 AN IMPROVED ALGORITHM FOR MATRIX BANDWIDTH AND PROFILE REDUCTION IN FINITE ELEMENT ANALYSIS Q. Wang National Key Laboratory of Antenna
More informationGS3. Andrés Legarra. March 5, Genomic Selection Gibbs Sampling Gauss Seidel
GS3 Genomic Selection Gibbs Sampling Gauss Seidel Andrés Legarra March 5, 2008 andres.legarra [at] toulouse.inra.fr INRA, UR 631, F-31326 Auzeville, France 1 Contents 1 Introduction 3 1.1 History...............................
More informationOpenFOAM + GPGPU. İbrahim Özküçük
OpenFOAM + GPGPU İbrahim Özküçük Outline GPGPU vs CPU GPGPU plugins for OpenFOAM Overview of Discretization CUDA for FOAM Link (cufflink) Cusp & Thrust Libraries How Cufflink Works Performance data of
More informationOn the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters
1 On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters N. P. Karunadasa & D. N. Ranasinghe University of Colombo School of Computing, Sri Lanka nishantha@opensource.lk, dnr@ucsc.cmb.ac.lk
More informationAnalysis and Optimization of Power Consumption in the Iterative Solution of Sparse Linear Systems on Multi-core and Many-core Platforms
Analysis and Optimization of Power Consumption in the Iterative Solution of Sparse Linear Systems on Multi-core and Many-core Platforms H. Anzt, V. Heuveline Karlsruhe Institute of Technology, Germany
More informationCOMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION
International Journal of Computer Engineering and Applications, Volume IX, Issue VIII, Sep. 15 www.ijcea.com ISSN 2321-3469 COMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION
More informationStep-by-Step Guide to Advanced Genetic Analysis
Step-by-Step Guide to Advanced Genetic Analysis Page 1 Introduction In the previous document, 1 we covered the standard genetic analyses available in JMP Genomics. Here, we cover the more advanced options
More informationIntroduction to Optimization
Introduction to Optimization Second Order Optimization Methods Marc Toussaint U Stuttgart Planned Outline Gradient-based optimization (1st order methods) plain grad., steepest descent, conjugate grad.,
More informationHigh-Performance Implementation of the Level-3 BLAS
High-Performance Implementation of the Level- BLAS KAZUSHIGE GOTO The University of Texas at Austin and ROBERT VAN DE GEIJN The University of Texas at Austin A simple but highly effective approach for
More informationPerformance of Implicit Solver Strategies on GPUs
9. LS-DYNA Forum, Bamberg 2010 IT / Performance Performance of Implicit Solver Strategies on GPUs Prof. Dr. Uli Göhner DYNAmore GmbH Stuttgart, Germany Abstract: The increasing power of GPUs can be used
More information