Performance Strategies for Parallel Mathematical Libraries Based on Historical Knowledgebase
|
|
- Estella Jocelyn Austin
- 5 years ago
- Views:
Transcription
1 Performance Strategies for Parallel Mathematical Libraries Based on Historical Knowledgebase CScADS workshop 29 Eduardo Cesar, Anna Morajko, Ihab Salawdeh Universitat Autònoma de Barcelona
2 Objective Mathematical Library (PETSc) Performance Methodology Knowledgebase Data Analysis Load balancing Memory pre-allocation Summary What we have would like 2
3 Design a methodology, based on input data, that can automatically choose between existing solving parameters to improve PETSc application s performance 3
4 Application Code Other higher Level solvers PDE Solvers SNES TS Time Stepping SLES Mathematical Algorithms KSP PC Draw Data Structure Matrices Vectors Index set Level of abstraction BLAS LAPACK MPI 4
5 Data Structure Matrices Vectors Index set Fundamental objects for storing linear operators There is no data structure appropriate for all problems. PETSc supports: dense storage where each process stores its entries in the usual Array style. Compressed sparse row storage, where only the nonzero values be stored. Block compressed Sparse row Storage Block Diagonal Storage 5
6 Application Code Other higher Level solvers PDE Solvers SNES TS Time Stepping SLES Mathematical Algorithms KSP PC Draw Data Structure Matrices Vectors Index set Level of abstraction BLAS LAPACK MPI 6
7 Mathematical Algorithms KSP PC To solve a linear system you need to calculate the matrix s preconditioner (PC) then to apply the Krylov Subspace method (KSP) Different mathematical algorithms exist either to calculate the preconditioner (jacobi, LU, etc.) or for the KSP (GMRES, CG, etc.) 7
8 Pattern recognition Data Mining Load balancing Knowledgebase Mathematical Constrains Mpiaij rechardson jaco Mpiaij chebychev asm... Dense cgs bjacobi 4 Mpibdaij chebychev as Mpiaij rechardson jaco Mpiaij chebychev asm Dense cgs bjacobi diag Solve problem Control Solution Memory preallocation
9 PETSc Solve Ax = b Main Routine Linear Solvers (KSP) PC Application Initialization Initialize A and b Post- Processing User code PETSc code 9
10 PETSc Solve Ax = b Pattern recognition Main Routine Knowledgebase Linear Solvers (KSP) PC Data Mining Solver Parameters Decision Application Initialization Initialize A and b Post- Processing User code PETSc code User code PETSc code Tuning Code
11 Main Routine Pattern recognition Knowledgebase Data Mining Solver Parameters Decision PETSc Solve Ax = b Linear Solvers (KSP) PC Application Initialization Initialize A and b Post- Processing User code PETSc code Tuning Code
12 Contains performance characterization of historical executions for different input data and hardware configurations It is organized according to the most common matrix patterns and the matrix size 2
13 Input Type Input Size No. Processors Memory Representation KSP PC Total Time (s) Matrix Pattern Around-Diagonal 952X952 8 mpiaij bcgs asm 3, Around-Diagonal 952X952 8 mpiaij bcgs bjacobi 25,5934 Around-Diagonal 952X952 8 mpiaij bcgs jacobi 35, Around-Diagonal 952X952 8 mpiaij bcgsl asm 222,55694 Around-Diagonal 952X952 8 mpiaij bcgsl bjacobi 89,236794,942,46,7576 2,63E-5,3553,489,896,2772 2,87E-5,7592,3492,298,72,275,8965,725,2237,86,767,78,94,427,57,245,23,756,866,373,3687,43,439,22,2387,59,965,852,569,7959,2629,457,4,52,246,3,846,2625,2327,7945,585,27,682,37,898,65,23,859,4,2374,298,642,925,423,2647,893,38,257,998,89,58,267,928,645,44,959,488,5944,624,623,5,885,2642,25,2499,359 5,25E-5,46,5852,4293,839,367,224,8769,9,9E-6,359,254,25,527,628,4946,623,873,258,224,2648,8342,2474 5,25E-5,52,6252,428,354,342,226,9,2455,8324,542,5666,632,234,375,6967,6452,69,2336,23866 Around-Diagonal 952X952 8 mpiaij bcgs asm 3, Distributed 9242X mpiaij bcgs asm 522,5523 Distributed 9242X mpiaij bcgs bjacobi 455,43445 Distributed 9242X mpiaij bcgs jacobi 459,62954 Distributed 9242X mpiaij bcgsl asm 457,2476,2539,2526,26,24,34,237,366,2525,2554,2556,8,5,333,368,2544,2546,2529,26 4,5E-5,39,379,376,47,759,6,763,62,25,6,326,35,29,45,6,38,56,26 7,2E-5,338,362,389,7,7 9,5E-5 2,7E-5,4E-6 5,9E-5,49,26,44,7,7,4 5,E-6 4,2E-5,28,5,44,26 2,8E-5,22,4 6,7E-5,37,27,9,45,74 4,E-6,6 9,8E-5,,34,26,48,5,5,26 8,9E-6 7E-6,,76,35,94,49,229,52,274,E-6 5,4E-7 8,2E-5,6,24,34,237,366,243,669,26,2,2,,8,5,333,368,368,37,692,263,22,2,22,2,2 4,5E-5,39,379,376,37,677,2546,2564,26,7 4,9E-6,7 5,E-6,99,25,6,326,35,396,37,685,2573,2569,2599,35,7,35,7,7 7,2E-5,338,362,389,397,684,2548,257,2548,2572,26,89,42,88,42,385,23,2,352,245,398,3,73,755,259,758,2586,82,2433,8,2427,82,6,52,26,253,2,263,3,744,6,75,2,768,92,26,2526,2542,2543,2535,76,54,2,7 3,4E-5 9,5E-6,9E-5,4E-6,2 4,9E-6,7,42,826,2559,258,2533,259,2534,5,9,3,8 2,4E-6,7E-5,E-6,2,22,6,35,86,2435,254,26 Distributed 9242X mpiaij bcgsl bjacobi 456,5858 Distributed 9242X mpiaij bcgsl jacobi 455,3952 3
14 4 25 Non-zero Blocks % Non-zero Diagonal
15 5 25 Non-zero Blocks % Non-zero Diagonal
16 6 25 Non-zero Blocks % Non-zero Diagonal 8 Non-zero Blocks 5% Non-zero Diagonal = 25 Diagonal distance = 5%
17 Number of Similar Entries = 39 Number of Different Entries = 25
18 Matrix No. Similar Entries (intersection) No. of Different Entries Union Matrix Similarity (Fitness) City Block (Manhattan) distance Diagonal Density Distance Mat % Mat % - Fitness Distance Max Distance (64)
19 Matrix No. Similar Entries (intersection) No. of Different Entries Union Matrix Similarity (Fitness) N. City Block (Manhattan) distance N. Diagonal Density Distance Mat Mat
20 Pattern recognition Data Mining Load balancing Knowledgebase Mathematical Constrains Mpiaij rechardson jaco Mpiaij chebychev asm... Dense cgs bjacobi 4 Mpibdaij chebychev as Mpiaij rechardson jaco Mpiaij chebychev asm Dense cgs bjacobi diag Solve problem Control Solution Memory preallocation
21 5 6 7 Uniform Matrix Distribution using PETSC_DECIDE Non uniform Matrix Distribution using PETSC_DECIDE Non uniform Matrix Distribution using Load Balancing Module
22 Mat A; Int column[3],i; double value[3]; MatCreate (PETSC_COMM_WORLD,&A); MatSetSizes (A, PETSC_DECIDE, PETSC_DECIDE, N, N); MatSetFromOption(A); Matrix global index value[] = -.; value[] = 2.; value[2] = -.; if (rank == ) { for (i=; i<n-2; i++) { column[] = i-; column[] = i; column[2] = i+; MatSetValues(A,,&i,3,column,value,INSERT_VALUES); } } // Must set also the and N- values PETSc determine the matrix cross-processor distribution MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY); MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY);
23 Line-based distribution for a 9242x 9242 matrix (PETSC_DECIDE) Proc. Number of Lines Values Total
24 Value-based distribution for a 9242x 9242 matrix Proc. Number of Lines Values Total
25 After specifying the data structure properties and the way it will be distributed, it is needed to allocate the memory. Automatic: each SET_VALUE call PETSc allocates the memory that corresponds to this value. (nonzero values malloc) Each processor allocate a memory of the size of its corresponding rows. ( malloc) Allocate the memory for the local sub matrix depending on the most dense row. ( malloc) Calculate the number of nonzero values per row. And allocate the memory for the row. (No. rows malloc) Each local processor loads its own data from the file 29
26 Create Setting Reading Assembly Solve Create Setting Reading Assembly Solve Execution time (s) 2 5 Executin time (s) Perfromance Performance 3
27 We have designed a methodology based on input data that can automatically choose between existing solving parameters to improve PETSc application s performance Our methodology follows data mining techniques and is based on the comparison of the input matrix with a set of predefined patterns that are stored in a knowledgebase The load balancing method automatically hands data out to all processes based on the non zero-value distribution across the matrix Choosing the suitable memory pre-allocation method improves memory utilization and reduces the cost of memory allocation 33
28 Components for analyzing the input matrix Database with hundreds of executions Components for load balancing and memory preallocation We can easily produce a wrapper for PETSc calls A more detailed characterization of the hardware (cache, memory, FLOPS, #cores?) 34
Anna Morajko.
Performance analysis and tuning of parallel/distributed applications Anna Morajko Anna.Morajko@uab.es 26 05 2008 Introduction Main research projects Develop techniques and tools for application performance
More informationPETSc Satish Balay, Kris Buschelman, Bill Gropp, Dinesh Kaushik, Lois McInnes, Barry Smith
PETSc http://www.mcs.anl.gov/petsc Satish Balay, Kris Buschelman, Bill Gropp, Dinesh Kaushik, Lois McInnes, Barry Smith PDE Application Codes PETSc PDE Application Codes! ODE Integrators! Nonlinear Solvers,!
More informationIntroduction to PETSc KSP, PC. CS595, Fall 2010
Introduction to PETSc KSP, PC CS595, Fall 2010 1 Linear Solution Main Routine PETSc Solve Ax = b Linear Solvers (KSP) PC Application Initialization Evaluation of A and b Post- Processing User code PETSc
More informationHigh-Performance Libraries and Tools. HPC Fall 2012 Prof. Robert van Engelen
High-Performance Libraries and Tools HPC Fall 2012 Prof. Robert van Engelen Overview Dense matrix BLAS (serial) ATLAS (serial/threaded) LAPACK (serial) Vendor-tuned LAPACK (shared memory parallel) ScaLAPACK/PLAPACK
More informationWhat it Takes to Assign Blame
What it Takes to Assign Blame Nick Rutar Jeffrey K. Hollingsworth Parallel Framework Mapping Traditional profiling represented as Functions, Basic Blocks, Statement Frameworks have intuitive abstractions
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 20: Sparse Linear Systems; Direct Methods vs. Iterative Methods Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 26
More informationPETSc PRACE video tutorial
PETSc PRACE video tutorial Václav Hapla IT4Innovations & Department of Applied Mathematics VSB - Technical University of Ostrava Ostrava, Czech Republic PETSc PRACE video tutorial Part IV: Matrices Václav
More informationIntroduction to Parallel Computing
Introduction to Parallel Computing W. P. Petersen Seminar for Applied Mathematics Department of Mathematics, ETHZ, Zurich wpp@math. ethz.ch P. Arbenz Institute for Scientific Computing Department Informatik,
More informationLecture 15: More Iterative Ideas
Lecture 15: More Iterative Ideas David Bindel 15 Mar 2010 Logistics HW 2 due! Some notes on HW 2. Where we are / where we re going More iterative ideas. Intro to HW 3. More HW 2 notes See solution code!
More informationMPI Related Software
1 MPI Related Software Profiling Libraries and Tools Visualizing Program Behavior Timing Performance Measurement and Tuning High Level Libraries Profiling Libraries MPI provides mechanism to intercept
More informationMPI Related Software. Profiling Libraries. Performance Visualization with Jumpshot
1 MPI Related Software Profiling Libraries and Tools Visualizing Program Behavior Timing Performance Measurement and Tuning High Level Libraries Performance Visualization with Jumpshot For detailed analysis
More informationTHE DEVELOPMENT OF THE POTENTIAL AND ACADMIC PROGRAMMES OF WROCLAW UNIVERISTY OF TECH- NOLOGY ITERATIVE LINEAR SOLVERS
ITERATIVE LIEAR SOLVERS. Objectives The goals of the laboratory workshop are as follows: to learn basic properties of iterative methods for solving linear least squares problems, to study the properties
More informationResearch Article A PETSc-Based Parallel Implementation of Finite Element Method for Elasticity Problems
Mathematical Problems in Engineering Volume 2015, Article ID 147286, 7 pages http://dx.doi.org/10.1155/2015/147286 Research Article A PETSc-Based Parallel Implementation of Finite Element Method for Elasticity
More informationGTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS. Kyle Spagnoli. Research EM Photonics 3/20/2013
GTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS Kyle Spagnoli Research Engineer @ EM Photonics 3/20/2013 INTRODUCTION» Sparse systems» Iterative solvers» High level benchmarks»
More informationSparse Matrices. This means that for increasing problem size the matrices become sparse and sparser. O. Rheinbach, TU Bergakademie Freiberg
Sparse Matrices Many matrices in computing only contain a very small percentage of nonzeros. Such matrices are called sparse ( dünn besetzt ). Often, an upper bound on the number of nonzeros in a row can
More informationSLEPc: Scalable Library for Eigenvalue Problem Computations
SLEPc: Scalable Library for Eigenvalue Problem Computations Jose E. Roman Joint work with A. Tomas and E. Romero Universidad Politécnica de Valencia, Spain 10th ACTS Workshop - August, 2009 Outline 1 Introduction
More informationAmgX 2.0: Scaling toward CORAL Joe Eaton, November 19, 2015
AmgX 2.0: Scaling toward CORAL Joe Eaton, November 19, 2015 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for the future 2 AmgX Fast, scalable linear solvers, emphasis on iterative
More informationIntroduction to SLEPc, the Scalable Library for Eigenvalue Problem Computations
Introduction to SLEPc, the Scalable Library for Eigenvalue Problem Computations Jose E. Roman Universidad Politécnica de Valencia, Spain (joint work with Andres Tomas) Porto, June 27-29, 2007 Outline 1
More informationLecture 17: More Fun With Sparse Matrices
Lecture 17: More Fun With Sparse Matrices David Bindel 26 Oct 2011 Logistics Thanks for info on final project ideas. HW 2 due Monday! Life lessons from HW 2? Where an error occurs may not be where you
More informationIBM Research. IBM Research Report
RC 24398 (W0711-017) November 5, 2007 (Last update: June 28, 2018) Computer Science/Mathematics IBM Research Report WSMP: Watson Sparse Matrix Package Part III iterative solution of sparse systems Version
More informationAutomatic Tuning of Sparse Matrix Kernels
Automatic Tuning of Sparse Matrix Kernels Kathy Yelick U.C. Berkeley and Lawrence Berkeley National Laboratory Richard Vuduc, Lawrence Livermore National Laboratory James Demmel, U.C. Berkeley Berkeley
More informationBatched Factorization and Inversion Routines for Block-Jacobi Preconditioning on GPUs
Workshop on Batched, Reproducible, and Reduced Precision BLAS Atlanta, GA 02/25/2017 Batched Factorization and Inversion Routines for Block-Jacobi Preconditioning on GPUs Hartwig Anzt Joint work with Goran
More informationPETSCEXT-V3.0.0: BLOCK EXTENSIONS TO PETSC
PETSCEXT-V300: BLOCK EXTENSIONS TO PETSC DAVE A MAY 1 Overview The discrete form of coupled partial differential equations require some ordering of the unknowns For example, fluid flow problems involving
More informationBuilding a Successful Scalable Parallel Numerical Library: Lessons From the PETSc Library. William D. Gropp
Building a Successful Scalable Parallel Numerical Library: Lessons From the PETSc Library William D. Gropp www.cs.uiuc.edu/homes/wgropp 1 What is PETSc? PETSc is a numerical library that is organized around
More informationfspai-1.0 Factorized Sparse Approximate Inverse Preconditioner
fspai-1.0 Factorized Sparse Approximate Inverse Preconditioner Thomas Huckle Matous Sedlacek 2011 08 01 Technische Universität München Research Unit Computer Science V Scientific Computing in Computer
More informationStatistical and Machine Learning Techniques Applied to Algorithm Selection for Solving Sparse Linear Systems
University of Tennessee, Knoxville Trace: Tennessee Research and Creative Exchange Doctoral Dissertations Graduate School 12-2007 Statistical and Machine Learning Techniques Applied to Algorithm Selection
More informationPARDISO Version Reference Sheet Fortran
PARDISO Version 5.0.0 1 Reference Sheet Fortran CALL PARDISO(PT, MAXFCT, MNUM, MTYPE, PHASE, N, A, IA, JA, 1 PERM, NRHS, IPARM, MSGLVL, B, X, ERROR, DPARM) 1 Please note that this version differs significantly
More informationSCALABLE ALGORITHMS for solving large sparse linear systems of equations
SCALABLE ALGORITHMS for solving large sparse linear systems of equations CONTENTS Sparse direct solvers (multifrontal) Substructuring methods (hybrid solvers) Jacko Koster, Bergen Center for Computational
More informationfspai-1.1 Factorized Sparse Approximate Inverse Preconditioner
fspai-1.1 Factorized Sparse Approximate Inverse Preconditioner Thomas Huckle Matous Sedlacek 2011 09 10 Technische Universität München Research Unit Computer Science V Scientific Computing in Computer
More informationPetIGA. A Framework for High Performance Isogeometric Analysis. Santa Fe, Argentina. Knoxville, United States. Thuwal, Saudi Arabia
PetIGA A Framework for High Performance Isogeometric Analysis Lisandro Dalcin 1,3, Nathaniel Collier 2, Adriano Côrtes 3, Victor M. Calo 3 1 Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
More informationBLAS and LAPACK + Data Formats for Sparse Matrices. Part of the lecture Wissenschaftliches Rechnen. Hilmar Wobker
BLAS and LAPACK + Data Formats for Sparse Matrices Part of the lecture Wissenschaftliches Rechnen Hilmar Wobker Institute of Applied Mathematics and Numerics, TU Dortmund email: hilmar.wobker@math.tu-dortmund.de
More information1. Represent each of these relations on {1, 2, 3} with a matrix (with the elements of this set listed in increasing order).
Exercises Exercises 1. Represent each of these relations on {1, 2, 3} with a matrix (with the elements of this set listed in increasing order). a) {(1, 1), (1, 2), (1, 3)} b) {(1, 2), (2, 1), (2, 2), (3,
More informationBLASFEO. Gianluca Frison. BLIS retreat September 19, University of Freiburg
University of Freiburg BLIS retreat September 19, 217 Basic Linear Algebra Subroutines For Embedded Optimization performance dgemm_nt 5 4 Intel Core i7 48MQ HP OpenBLAS.2.19 MKL 217.2.174 ATLAS 3.1.3 BLIS.1.6
More informationPerformance Models for Evaluation and Automatic Tuning of Symmetric Sparse Matrix-Vector Multiply
Performance Models for Evaluation and Automatic Tuning of Symmetric Sparse Matrix-Vector Multiply University of California, Berkeley Berkeley Benchmarking and Optimization Group (BeBOP) http://bebop.cs.berkeley.edu
More informationUsing PETSc for scientific computing in parallel
Using PETSc for scientific computing in parallel Ed Bueler Dept. of Mathematics and Statistics, UAF 19 June, 2007 (ARSC) This talk comes out of joint work on PISM with Jed Brown (now at ETH Switzerland),
More informationAMath 483/583 Lecture 22. Notes: Another Send/Receive example. Notes: Notes: Another Send/Receive example. Outline:
AMath 483/583 Lecture 22 Outline: MPI Master Worker paradigm Linear algebra LAPACK and the BLAS References: $UWHPSC/codes/mpi class notes: MPI section class notes: Linear algebra Another Send/Receive example
More informationNEW ADVANCES IN GPU LINEAR ALGEBRA
GTC 2012: NEW ADVANCES IN GPU LINEAR ALGEBRA Kyle Spagnoli EM Photonics 5/16/2012 QUICK ABOUT US» HPC/GPU Consulting Firm» Specializations in:» Electromagnetics» Image Processing» Fluid Dynamics» Linear
More informationHYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER PROF. BRYANT PROF. KAYVON 15618: PARALLEL COMPUTER ARCHITECTURE
HYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER AVISHA DHISLE PRERIT RODNEY ADHISLE PRODNEY 15618: PARALLEL COMPUTER ARCHITECTURE PROF. BRYANT PROF. KAYVON LET S
More informationACCELERATING CFD AND RESERVOIR SIMULATIONS WITH ALGEBRAIC MULTI GRID Chris Gottbrath, Nov 2016
ACCELERATING CFD AND RESERVOIR SIMULATIONS WITH ALGEBRAIC MULTI GRID Chris Gottbrath, Nov 2016 Challenges What is Algebraic Multi-Grid (AMG)? AGENDA Why use AMG? When to use AMG? NVIDIA AmgX Results 2
More informationContents. F10: Parallel Sparse Matrix Computations. Parallel algorithms for sparse systems Ax = b. Discretized domain a metal sheet
Contents 2 F10: Parallel Sparse Matrix Computations Figures mainly from Kumar et. al. Introduction to Parallel Computing, 1st ed Chap. 11 Bo Kågström et al (RG, EE, MR) 2011-05-10 Sparse matrices and storage
More informationPerformance Analysis of BLAS Libraries in SuperLU_DIST for SuperLU_MCDT (Multi Core Distributed) Development
Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Performance Analysis of BLAS Libraries in SuperLU_DIST for SuperLU_MCDT (Multi Core Distributed) Development M. Serdar Celebi
More informationDynamic Selection of Auto-tuned Kernels to the Numerical Libraries in the DOE ACTS Collection
Numerical Libraries in the DOE ACTS Collection The DOE ACTS Collection SIAM Parallel Processing for Scientific Computing, Savannah, Georgia Feb 15, 2012 Tony Drummond Computational Research Division Lawrence
More informationA User s View of OpenMP: The Good, The Bad, and The Ugly
A User s View of OpenMP: The Good, The Bad, and The Ugly William D. Gropp Mathematics and Computer Science Division Argonne National Laboratory http://www.mcs.anl.gov/~gropp Collaborators Dinesh K. Kaushik
More informationAchieving Efficient Strong Scaling with PETSc Using Hybrid MPI/OpenMP Optimisation
Achieving Efficient Strong Scaling with PETSc Using Hybrid MPI/OpenMP Optimisation Michael Lange 1 Gerard Gorman 1 Michele Weiland 2 Lawrence Mitchell 2 Xiaohu Guo 3 James Southern 4 1 AMCG, Imperial College
More informationLibraries for Scientific Computing: an overview and introduction to HSL
Libraries for Scientific Computing: an overview and introduction to HSL Mario Arioli Jonathan Hogg STFC Rutherford Appleton Laboratory 2 / 41 Overview of talk Brief introduction to who we are An overview
More information2 Fundamentals of Serial Linear Algebra
. Direct Solution of Linear Systems.. Gaussian Elimination.. LU Decomposition and FBS..3 Cholesky Decomposition..4 Multifrontal Methods. Iterative Solution of Linear Systems.. Jacobi Method Fundamentals
More informationIterative Sparse Triangular Solves for Preconditioning
Euro-Par 2015, Vienna Aug 24-28, 2015 Iterative Sparse Triangular Solves for Preconditioning Hartwig Anzt, Edmond Chow and Jack Dongarra Incomplete Factorization Preconditioning Incomplete LU factorizations
More informationLecture 11: Randomized Least-squares Approximation in Practice. 11 Randomized Least-squares Approximation in Practice
Stat60/CS94: Randomized Algorithms for Matrices and Data Lecture 11-10/09/013 Lecture 11: Randomized Least-squares Approximation in Practice Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these
More informationCUDA Accelerated Compute Libraries. M. Naumov
CUDA Accelerated Compute Libraries M. Naumov Outline Motivation Why should you use libraries? CUDA Toolkit Libraries Overview of performance CUDA Proprietary Libraries Address specific markets Third Party
More informationEpetra Performance Optimization Guide
SAND2005-1668 Unlimited elease Printed March 2005 Updated for Trilinos 9.0 February 2009 Epetra Performance Optimization Guide Michael A. Heroux Scalable Algorithms Department Sandia National Laboratories
More informationSparse Matrix Libraries in C++ for High Performance. Architectures. ferent sparse matrix data formats in order to best
Sparse Matrix Libraries in C++ for High Performance Architectures Jack Dongarra xz, Andrew Lumsdaine, Xinhui Niu Roldan Pozo z, Karin Remington x x Oak Ridge National Laboratory z University oftennessee
More informationGPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE)
GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE) NATALIA GIMELSHEIN ANSHUL GUPTA STEVE RENNICH SEID KORIC NVIDIA IBM NVIDIA NCSA WATSON SPARSE MATRIX PACKAGE (WSMP) Cholesky, LDL T, LU factorization
More informationAutomatic Development of Linear Algebra Libraries for the Tesla Series
Automatic Development of Linear Algebra Libraries for the Tesla Series Enrique S. Quintana-Ortí quintana@icc.uji.es Universidad Jaime I de Castellón (Spain) Dense Linear Algebra Major problems: Source
More informationIntel Direct Sparse Solver for Clusters, a research project for solving large sparse systems of linear algebraic equation
Intel Direct Sparse Solver for Clusters, a research project for solving large sparse systems of linear algebraic equation Alexander Kalinkin Anton Anders Roman Anders 1 Legal Disclaimer INFORMATION IN
More informationOSKI: Autotuned sparse matrix kernels
: Autotuned sparse matrix kernels Richard Vuduc LLNL / Georgia Tech [Work in this talk: Berkeley] James Demmel, Kathy Yelick, UCB [BeBOP group] CScADS Autotuning Workshop What does the sparse case add
More informationParallel Implementations of Gaussian Elimination
s of Western Michigan University vasilije.perovic@wmich.edu January 27, 2012 CS 6260: in Parallel Linear systems of equations General form of a linear system of equations is given by a 11 x 1 + + a 1n
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 5: Sparse Linear Systems and Factorization Methods Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 18 Sparse
More informationCS 542G: Solving Sparse Linear Systems
CS 542G: Solving Sparse Linear Systems Robert Bridson November 26, 2008 1 Direct Methods We have already derived several methods for solving a linear system, say Ax = b, or the related leastsquares problem
More informationTools and Libraries for Parallel Sparse Matrix Computations. Edmond Chow and Yousef Saad. University of Minnesota. Minneapolis, MN
Tools and Libraries for Parallel Sparse Matrix Computations Edmond Chow and Yousef Saad Department of Computer Science, and Minnesota Supercomputer Institute University of Minnesota Minneapolis, MN 55455
More informationAccelerating the Iterative Linear Solver for Reservoir Simulation
Accelerating the Iterative Linear Solver for Reservoir Simulation Wei Wu 1, Xiang Li 2, Lei He 1, Dongxiao Zhang 2 1 Electrical Engineering Department, UCLA 2 Department of Energy and Resources Engineering,
More informationSelf Adapting Numerical Software (SANS-Effort)
Self Adapting Numerical Software (SANS-Effort) Jack Dongarra Innovative Computing Laboratory University of Tennessee and Oak Ridge National Laboratory 1 Work on Self Adapting Software 1. Lapack For Clusters
More informationEFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI
EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI 1 Akshay N. Panajwar, 2 Prof.M.A.Shah Department of Computer Science and Engineering, Walchand College of Engineering,
More informationESPRESO ExaScale PaRallel FETI Solver. Hybrid FETI Solver Report
ESPRESO ExaScale PaRallel FETI Solver Hybrid FETI Solver Report Lubomir Riha, Tomas Brzobohaty IT4Innovations Outline HFETI theory from FETI to HFETI communication hiding and avoiding techniques our new
More informationHIPS : a parallel hybrid direct/iterative solver based on a Schur complement approach
HIPS : a parallel hybrid direct/iterative solver based on a Schur complement approach Mini-workshop PHyLeaS associated team J. Gaidamour, P. Hénon July 9, 28 HIPS : an hybrid direct/iterative solver /
More informationImplicit schemes for wave models
Implicit schemes for wave models Mathieu Dutour Sikirić Rudjer Bo sković Institute, Croatia and Universität Rostock April 17, 2013 I. Wave models Stochastic wave modelling Oceanic models are using grids
More informationMAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures
MAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures Stan Tomov Innovative Computing Laboratory University of Tennessee, Knoxville OLCF Seminar Series, ORNL June 16, 2010
More informationpaper, we focussed on the GMRES(m) which is improved GMRES(Generalized Minimal RESidual method), and developed its library on distributed memory machi
Performance of Automatically Tuned Parallel GMRES(m) Method on Distributed Memory Machines Hisayasu KURODA 1?, Takahiro KATAGIRI 12, and Yasumasa KANADA 3 1 Department of Information Science, Graduate
More informationDeveloping a High Performance Software Library with MPI and CUDA for Matrix Computations
Developing a High Performance Software Library with MPI and CUDA for Matrix Computations Bogdan Oancea 1, Tudorel Andrei 2 1 Nicolae Titulescu University of Bucharest, e-mail: bogdanoancea@univnt.ro, Calea
More informationOutline. Parallel Algorithms for Linear Algebra. Number of Processors and Problem Size. Speedup and Efficiency
1 2 Parallel Algorithms for Linear Algebra Richard P. Brent Computer Sciences Laboratory Australian National University Outline Basic concepts Parallel architectures Practical design issues Programming
More informationUppsala University Department of Information technology. Hands-on 1: Ill-conditioning = x 2
Uppsala University Department of Information technology Hands-on : Ill-conditioning Exercise (Ill-conditioned linear systems) Definition A system of linear equations is said to be ill-conditioned when
More informationPerformance of PETSc GPU Implementation with Sparse Matrix Storage Schemes
Performance of PETSc GPU Implementation with Sparse Matrix Storage Schemes Pramod Kumbhar August 19, 2011 MSc in High Performance Computing The University of Edinburgh Year of Presentation: 2011 Abstract
More informationAdvanced Computer Graphics
G22.2274 001, Fall 2010 Advanced Computer Graphics Project details and tools 1 Projects Details of each project are on the website under Projects Please review all the projects and come see me if you would
More informationScientific Computing. Some slides from James Lambers, Stanford
Scientific Computing Some slides from James Lambers, Stanford Dense Linear Algebra Scaling and sums Transpose Rank-one updates Rotations Matrix vector products Matrix Matrix products BLAS Designing Numerical
More informationMAGMA: a New Generation
1.3 MAGMA: a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures Jack Dongarra T. Dong, M. Gates, A. Haidar, S. Tomov, and I. Yamazaki University of Tennessee, Knoxville Release
More informationFOR P3: A monolithic multigrid FEM solver for fluid structure interaction
FOR 493 - P3: A monolithic multigrid FEM solver for fluid structure interaction Stefan Turek 1 Jaroslav Hron 1,2 Hilmar Wobker 1 Mudassar Razzaq 1 1 Institute of Applied Mathematics, TU Dortmund, Germany
More informationFigure 6.1: Truss topology optimization diagram.
6 Implementation 6.1 Outline This chapter shows the implementation details to optimize the truss, obtained in the ground structure approach, according to the formulation presented in previous chapters.
More informationHPC Libraries. Hartmut Kaiser PhD. High Performance Computing: Concepts, Methods & Means
High Performance Computing: Concepts, Methods & Means HPC Libraries Hartmut Kaiser PhD Center for Computation & Technology Louisiana State University April 19 th, 2007 Outline Introduction to High Performance
More informationReport of Linear Solver Implementation on GPU
Report of Linear Solver Implementation on GPU XIANG LI Abstract As the development of technology and the linear equation solver is used in many aspects such as smart grid, aviation and chemical engineering,
More informationMAGMA. Matrix Algebra on GPU and Multicore Architectures
MAGMA Matrix Algebra on GPU and Multicore Architectures Innovative Computing Laboratory Electrical Engineering and Computer Science University of Tennessee Piotr Luszczek (presenter) web.eecs.utk.edu/~luszczek/conf/
More informationReview of previous examinations TMA4280 Introduction to Supercomputing
Review of previous examinations TMA4280 Introduction to Supercomputing NTNU, IMF April 24. 2017 1 Examination The examination is usually comprised of: one problem related to linear algebra operations with
More informationSummer 2009 REU: Introduction to Some Advanced Topics in Computational Mathematics
Summer 2009 REU: Introduction to Some Advanced Topics in Computational Mathematics Moysey Brio & Paul Dostert July 4, 2009 1 / 18 Sparse Matrices In many areas of applied mathematics and modeling, one
More information(Sparse) Linear Solvers
(Sparse) Linear Solvers Ax = B Why? Many geometry processing applications boil down to: solve one or more linear systems Parameterization Editing Reconstruction Fairing Morphing 2 Don t you just invert
More informationNAG Library Function Document nag_superlu_lu_factorize (f11mec)
NAG Library Function Document nag_superlu_lu_factorize () 1 Purpose nag_superlu_lu_factorize () computes the LU factorization of a real sparse matrix in compressed column (Harwell Boeing), column-permuted
More informationSolving Dense Linear Systems on Graphics Processors
Solving Dense Linear Systems on Graphics Processors Sergio Barrachina Maribel Castillo Francisco Igual Rafael Mayo Enrique S. Quintana-Ortí High Performance Computing & Architectures Group Universidad
More informationHands-On Exercises for SLEPc
Scalable Library for Eigenvalue Problem Computations http://slepc.upv.es Hands-On Exercises for SLEPc Oct 2017 Intended for use with version 3.8 of SLEPc These exercises introduce application development
More informationMaths for Signals and Systems Linear Algebra in Engineering. Some problems by Gilbert Strang
Maths for Signals and Systems Linear Algebra in Engineering Some problems by Gilbert Strang Problems. Consider u, v, w to be non-zero vectors in R 7. These vectors span a vector space. What are the possible
More informationOn Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators
On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators Karl Rupp, Barry Smith rupp@mcs.anl.gov Mathematics and Computer Science Division Argonne National Laboratory FEMTEC
More informationAlgebraic Iterative Methods for Computed Tomography
Algebraic Iterative Methods for Computed Tomography Per Christian Hansen DTU Compute Department of Applied Mathematics and Computer Science Technical University of Denmark Per Christian Hansen Algebraic
More informationSELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND
Student Submission for the 5 th OpenFOAM User Conference 2017, Wiesbaden - Germany: SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND TESSA UROIĆ Faculty of Mechanical Engineering and Naval Architecture, Ivana
More informationAdaptable benchmarks for register blocked sparse matrix-vector multiplication
Adaptable benchmarks for register blocked sparse matrix-vector multiplication Berkeley Benchmarking and Optimization group (BeBOP) Hormozd Gahvari and Mark Hoemmen Based on research of: Eun-Jin Im Rich
More informationBrief notes on setting up semi-high performance computing environments. July 25, 2014
Brief notes on setting up semi-high performance computing environments July 25, 2014 1 We have two different computing environments for fitting demanding models to large space and/or time data sets. 1
More informationTomonori Kouya Shizuoka Institute of Science and Technology Toyosawa, Fukuroi, Shizuoka Japan. October 5, 2018
arxiv:1411.2377v1 [math.na] 10 Nov 2014 A Highly Efficient Implementation of Multiple Precision Sparse Matrix-Vector Multiplication and Its Application to Product-type Krylov Subspace Methods Tomonori
More informationA GPU Sparse Direct Solver for AX=B
1 / 25 A GPU Sparse Direct Solver for AX=B Jonathan Hogg, Evgueni Ovtchinnikov, Jennifer Scott* STFC Rutherford Appleton Laboratory 26 March 2014 GPU Technology Conference San Jose, California * Thanks
More informationSparse Matrices Direct methods
Sparse Matrices Direct methods Iain Duff STFC Rutherford Appleton Laboratory and CERFACS Summer School The 6th de Brùn Workshop. Linear Algebra and Matrix Theory: connections, applications and computations.
More informationnag sparse nsym sol (f11dec)
f11 Sparse Linear Algebra f11dec nag sparse nsym sol (f11dec) 1. Purpose nag sparse nsym sol (f11dec) solves a real sparse nonsymmetric system of linear equations, represented in coordinate storage format,
More informationPreliminary Implementation of PETSc Using GPUs
Preliminary Implementation of PETSc Using GPUs Victor Minden, Barry Smith, and Matthew G. Knepley To appear in the Proceedings of the 2010 International Workshop of GPU Solutions to Multiscale Problems
More informationSupercomputing and Science An Introduction to High Performance Computing
Supercomputing and Science An Introduction to High Performance Computing Part VII: Scientific Computing Henry Neeman, Director OU Supercomputing Center for Education & Research Outline Scientific Computing
More informationSparse Matrix Formats
Christopher Bross Friedrich-Alexander-Universität Erlangen-Nürnberg Motivation Sparse Matrices are everywhere Sparse Matrix Formats C. Bross BGCE Research Day, Erlangen, 09.06.2016 2/16 Motivation Sparse
More informationMixed Precision Methods
Mixed Precision Methods Mixed precision, use the lowest precision required to achieve a given accuracy outcome " Improves runtime, reduce power consumption, lower data movement " Reformulate to find correction
More informationHierarchical Low-Rank Approximation Methods on Distributed Memory and GPUs
JH160041-NAHI Hierarchical Low-Rank Approximation Methods on Distributed Memory and GPUs Rio Yokota(Tokyo Institute of Technology) Hierarchical low-rank approximation methods such as H-matrix, H2-matrix,
More information