Sparse LU Factorization for Parallel Circuit Simulation on GPUs
|
|
- Lucy Fisher
- 5 years ago
- Views:
Transcription
1 Department of Electronic Engineering, Tsinghua University Sparse LU Factorization for Parallel Circuit Simulation on GPUs Ling Ren, Xiaoming Chen, Yu Wang, Chenxi Zhang, Huazhong Yang Nano-scale Integrated Circuit and System Lab., Department of Electronic Engineering, Tsinghua University Nano-scale Integrated Circuit and System Lab. 1
2 Motivation Parallel SPICE simulator bottleneck 2
3 Related works Dense LU [Volkov2008, Tomov2010] Very efficient on GPUs (850 Gflop/s) Sparse LU SuperLU and Pardiso: Supernode (dense blocks) [Christen2007] dense blocks on GPU UMFPACK, MUMPS, WSMP: multifrontal No dense blocks in extremely sparse matrices KLU, for circuit matrices, without Supernode G/P left-looking algorithm [G/P 1988] Sequential version only 3
4 Algorithm left-looking Each column (k) is sequentially updated (vector multiplyand-add, MAD) by all the columns on its left b b b c c a a = a c b read write nonzero Nonzero structure of U determines the dependency and the EGraph Egraph [chen2011] nodes: columns Edges: vector MAD 4 (a) Upper triangular matrix U (b)egraph
5 Algorithm analysis parallelism Divide EGraph into levels Columns in the same level are independent Cluster mode & pipeline mode A sample EGraph Timing order in pipeline mode 5
6 GPU implementation - avoid deadlock Traditionally, some warps Inactive at the beginning Activated when other active warps finish But in sparse LU, all warps must be active from the beginning An upper bound for concurrent columns 6
7 GPU implementation memory access pattern 7
8 GPU implementation - workflow 8
9 Performance analysis More concurrent columns, higher performance? No, inexecutable operations. 9
10 Experiments CPU: 2 Xeon X5680 GPU: NVIDIA GTX580 Testing matrices University of Florida Sparse Matrix Collection (not only circuit matrices) Hybrid solver 1-core / multi-core / many-core (GPU) Group Bandwidth (GB/s) GPU 1 CPU 4 CPUs 8 CPUs KLU A (flop < 200M) B (flop > 200M)
11 11
12 12
13 Hybrid Solver Based on the number of flops in the factorization Sequential or parallel? [Chen 2011] Single-core, multi-core or many-core (GPU) Accuracy: Pivoting once + several numerical factorization Since nonzero values do not change rapidly When nonzeros do vary greatly, pivot (preprocess) again 13
14 Summary Sparse LU solver on GPU Timing order and work partitioning on GPU The optimal number of concurrent columns Memory access pattern Hybrid Solver As FLOPS increase, left-looking algorithm should be done on 1-core, multi-core or many-core (GPU). 14
15 Limitation & Future work On distributed-memory machines (e.g. multiple GPU)? Limited memory on GPU Blocked Algorithm? Circuit partition + blocked factorization Boarded-Blocked-Diagonal (BBD) matrices Thank you! 15
16 Reference [Volkov2008] V.Volkov and J. Demmel, Benchmarking GPUs to tune dense linear algebra, SC '08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, IEEE Press, 2008, pp [Tomov2010] S. Tomov, J. Dongarra, and M. Baboulin, Towards dense linear algebra for hybrid GPU accelerated many-core systems, Parallel Comput., vol. 36, pp , June [Christen2007] M. Christen, O. Schenk, and H. Burkhart, General-purpose sparse matrix building blocks using the NVIDIA CUDA technology platform, [SuperLU1999] J. W. Demmel, S. C. Eisenstat, J. R. Gilbert, X. S. Li, and J. W. H. Liu, A supernodal approach to sparse partial pivoting, SIAM J. Matrix Analysis and Applications, vol. 20, no. 3, pp , 1999 [Pardiso2002] O. Schenk and K. Gartner, Solving un-symmetric sparse systems of linear equations with PARDISO, Computational Science - ICCS 2002, vol. 2330, pp , [Florida] T. A. Davis and Y. Hu, The university of Florida sparse matrix collection, to appear in ACM Transactions on Mathematical Software. 16
17 Reference [G/P 1988] J. R. Gilbert and T. Peierls, Sparse partial pivoting in time proportional to arithmetic operations, SIAM J. Sci. Statist. Comput., vol. 9, pp , 1988 [KLU2010] T. A. Davis and E. Palamadai Natarajan, Algorithm 907: KLU, a direct sparse solver for circuit simulation problems, ACM Trans. Math. Softw., vol. 37, pp. 36:1 36:17, September [MC64] I. S. Duff and J. Koster, The design and use of algorithms for permuting large entries to the diagonal of sparse matrices, SIAM J. Matrix Anal. and Applics, no. 4, pp , [AMD] P. R. Amestoy, Enseeiht-Irit, T. A. Davis, and I. S. Duff, Algorithm 837: AMD, an approximate minimum degree ordering algorithm, ACM Trans. Math. Softw., vol. 30, pp , September [Chen 2011] X. Chen, W. Wu, Y. Wang, H. Yu, and H. Yang, An escheduler-based data dependence analysis and task scheduling for parallel circuit simulation, Circuits and Systems II: Express Briefs, IEEE Transactions on, vol. 58, no. 10, pp , oct
An Adaptive LU Factorization Algorithm for Parallel Circuit Simulation
An Adaptive LU Factorization Algorithm for Parallel Circuit Simulation Xiaoming Chen, Yu Wang, Huazhong Yang Department of Electronic Engineering Tsinghua National Laboratory for Information Science and
More informationSparse LU Factorization for Parallel Circuit Simulation on GPU
Sparse LU Factorization for Parallel Circuit Simulation on GPU Ling Ren, Xiaoming Chen, Yu Wang, Chenxi Zhang, Huazhong Yang Department of Electronic Engineering Tsinghua National Laboratory for Information
More informationUser Guide for GLU V2.0
User Guide for GLU V2.0 Lebo Wang and Sheldon Tan University of California, Riverside June 2017 Contents 1 Introduction 1 2 Using GLU within a C/C++ program 2 2.1 GLU flowchart........................
More informationTHE Simulation Program with Integrated Circuit Emphasis
786 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 26, NO. 3, MARCH 2015 GPU-Accelerated Sparse LU Factorization for Circuit Simulation with Performance Modeling Xiaoming Chen, Student Member,
More informationGPU-Accelerated Parallel Sparse LU Factorization Method for Fast Circuit Analysis
IEEE TRANSACTIONS ON VLSI, VOL XX, NO. XX, DECEMBER 201X 1 GPU-Accelerated Parallel Sparse LU Factorization Method for Fast Circuit Analysis Kai He Student Member, IEEE, Sheldon X.-D. Tan Senior Member,
More informationFPGA Accelerated Parallel Sparse Matrix Factorization for Circuit Simulations*
FPGA Accelerated Parallel Sparse Matrix Factorization for Circuit Simulations* Wei Wu, Yi Shan, Xiaoming Chen, Yu Wang, and Huazhong Yang Department of Electronic Engineering, Tsinghua National Laboratory
More informationDirect Solvers for Sparse Matrices X. Li September 2006
Direct Solvers for Sparse Matrices X. Li September 2006 Direct solvers for sparse matrices involve much more complicated algorithms than for dense matrices. The main complication is due to the need for
More informationA Configurable Architecture for Sparse LU Decomposition on Matrices with Arbitrary Patterns
A Configurable Architecture for Sparse LU Decomposition on Matrices with Arbitrary Patterns Xinying Wang, Phillip H. Jones and Joseph Zambreno Department of Electrical and Computer Engineering Iowa State
More informationASIMULATION program with integrated circuit emphasis
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 32, NO. 2, FEBRUARY 2013 261 NICSLU: An Adaptive Sparse Matrix Solver for Parallel Circuit Simulation Xiaoming Chen,
More informationGPU-Accelerated Parallel Sparse LU Factorization Method for Fast Circuit Analysis
GPU-Accelerated Parallel Sparse LU Factorization Method for Fast Circuit Analysis Abstract: Lower upper (LU) factorization for sparse matrices is the most important computing step for circuit simulation
More informationAccelerating the Iterative Linear Solver for Reservoir Simulation
Accelerating the Iterative Linear Solver for Reservoir Simulation Wei Wu 1, Xiang Li 2, Lei He 1, Dongxiao Zhang 2 1 Electrical Engineering Department, UCLA 2 Department of Energy and Resources Engineering,
More informationMAGMA Library. version 0.1. S. Tomov J. Dongarra V. Volkov J. Demmel
MAGMA Library version 0.1 S. Tomov J. Dongarra V. Volkov J. Demmel 2 -- MAGMA (version 0.1) -- Univ. of Tennessee, Knoxville Univ. of California, Berkeley Univ. of Colorado, Denver June 2009 MAGMA project
More informationTHE procedure used to solve inverse problems in areas such as Electrical Impedance
12TH INTL. CONFERENCE IN ELECTRICAL IMPEDANCE TOMOGRAPHY (EIT 2011), 4-6 MAY 2011, UNIV. OF BATH 1 Scaling the EIT Problem Alistair Boyle, Andy Adler, Andrea Borsic Abstract There are a number of interesting
More informationBasker: A Threaded Sparse LU Factorization Utilizing Hierarchical Parallelism and Data Layouts
Basker: A Threaded Sparse LU Factorization Utilizing Hierarchical Parallelism and Data Layouts Joshua Dennis Booth Sandia National Laboratories Albuquerque, New Mexico jdbooth@sandia.gov Sivasankaran Rajamanickam
More informationSparse Matrix Algorithms
Sparse Matrix Algorithms combinatorics + numerical methods + applications Math + X Tim Davis University of Florida June 2013 contributions to the field current work vision for the future Outline Math+X
More informationOn Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators
On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators Karl Rupp, Barry Smith rupp@mcs.anl.gov Mathematics and Computer Science Division Argonne National Laboratory FEMTEC
More informationAlgorithm 8xx: SuiteSparseQR, a multifrontal multithreaded sparse QR factorization package
Algorithm 8xx: SuiteSparseQR, a multifrontal multithreaded sparse QR factorization package TIMOTHY A. DAVIS University of Florida SuiteSparseQR is an implementation of the multifrontal sparse QR factorization
More informationFast and reliable linear system solutions on new parallel architectures
Fast and reliable linear system solutions on new parallel architectures Marc Baboulin Université Paris-Sud Chaire Inria Saclay Île-de-France Séminaire Aristote - Ecole Polytechnique 15 mai 2013 Marc Baboulin
More informationSparse LU Decomposition using FPGA
Sparse LU Decomposition using FPGA Jeremy Johnson 1, Timothy Chagnon 1, Petya Vachranukunkiet 2, Prawat Nagvajara 2, and Chika Nwankpa 2 CS 1 and ECE 2 Departments Drexel University, Philadelphia, PA jjohnson@cs.drexel.edu,tchagnon@drexel.edu,pv29@drexel.edu,
More informationGeneral-Purpose Sparse Matrix Building Blocks using the NVIDIA CUDA Technology Platform
1 General-Purpose Sparse Matrix Building Blocks using the NVIDIA CUDA Technology Platform Matthias Christen, Olaf Schenk, Member, IEEE, and Helmar Burkhart, Member, IEEE Abstract We report on our experience
More informationPerformance Analysis of BLAS Libraries in SuperLU_DIST for SuperLU_MCDT (Multi Core Distributed) Development
Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Performance Analysis of BLAS Libraries in SuperLU_DIST for SuperLU_MCDT (Multi Core Distributed) Development M. Serdar Celebi
More informationGPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE)
GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE) NATALIA GIMELSHEIN ANSHUL GUPTA STEVE RENNICH SEID KORIC NVIDIA IBM NVIDIA NCSA WATSON SPARSE MATRIX PACKAGE (WSMP) Cholesky, LDL T, LU factorization
More informationAccelerating Linear System Solutions Using Randomization Techniques
Accelerating Linear System Solutions Using Randomization Techniques MARC BABOULIN, Inria Saclay - Île-de-France and University Paris-Sud JACK DONGARRA, University of Tennessee and Oak Ridge National Laboratory,
More informationAccelerating Linpack Performance with Mixed Precision Algorithm on CPU+GPGPU Heterogeneous Cluster
th IEEE International Conference on Computer and Information Technology (CIT ) Accelerating Linpack Performance with Mixed Precision Algorithm on CPU+GPGPU Heterogeneous Cluster WANG Lei ZHANG Yunquan
More informationNumerical methods for on-line power system load flow analysis
Energy Syst (2010) 1: 273 289 DOI 10.1007/s12667-010-0013-6 ORIGINAL PAPER Numerical methods for on-line power system load flow analysis Siddhartha Kumar Khaitan James D. McCalley Mandhapati Raju Received:
More informationHow to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić
How to perform HPL on CPU&GPU clusters Dr.sc. Draško Tomić email: drasko.tomic@hp.com Forecasting is not so easy, HPL benchmarking could be even more difficult Agenda TOP500 GPU trends Some basics about
More informationExploiting Multiple GPUs in Sparse QR: Regular Numerics with Irregular Data Movement
Exploiting Multiple GPUs in Sparse QR: Regular Numerics with Irregular Data Movement Tim Davis (Texas A&M University) with Sanjay Ranka, Mohamed Gadou (University of Florida) Nuri Yeralan (Microsoft) NVIDIA
More informationA MATLAB Interface to the GPU
Introduction Results, conclusions and further work References Department of Informatics Faculty of Mathematics and Natural Sciences University of Oslo June 2007 Introduction Results, conclusions and further
More informationBlock Lanczos-Montgomery Method over Large Prime Fields with GPU Accelerated Dense Operations
Block Lanczos-Montgomery Method over Large Prime Fields with GPU Accelerated Dense Operations D. Zheltkov, N. Zamarashkin INM RAS September 24, 2018 Scalability of Lanczos method Notations Matrix order
More informationNested-Dissection Orderings for Sparse LU with Partial Pivoting
Nested-Dissection Orderings for Sparse LU with Partial Pivoting Igor Brainman 1 and Sivan Toledo 1 School of Mathematical Sciences, Tel-Aviv University Tel-Aviv 69978, ISRAEL Email: sivan@math.tau.ac.il
More informationSparse Direct Solvers for Extreme-Scale Computing
Sparse Direct Solvers for Extreme-Scale Computing Iain Duff Joint work with Florent Lopez and Jonathan Hogg STFC Rutherford Appleton Laboratory SIAM Conference on Computational Science and Engineering
More informationComputing the rank of big sparse matrices modulo p using gaussian elimination
Computing the rank of big sparse matrices modulo p using gaussian elimination Charles Bouillaguet 1 Claire Delaplace 2 12 CRIStAL, Université de Lille 2 IRISA, Université de Rennes 1 JNCF, 16 janvier 2017
More informationA Static Parallel Multifrontal Solver for Finite Element Meshes
A Static Parallel Multifrontal Solver for Finite Element Meshes Alberto Bertoldo, Mauro Bianco, and Geppino Pucci Department of Information Engineering, University of Padova, Padova, Italy {cyberto, bianco1,
More informationMAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures
MAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures Stan Tomov Innovative Computing Laboratory University of Tennessee, Knoxville OLCF Seminar Series, ORNL June 16, 2010
More informationarxiv: v1 [cs.ms] 2 Jun 2016
Parallel Triangular Solvers on GPU Zhangxin Chen, Hui Liu, and Bo Yang University of Calgary 2500 University Dr NW, Calgary, AB, Canada, T2N 1N4 {zhachen,hui.j.liu,yang6}@ucalgary.ca arxiv:1606.00541v1
More informationModel Reduction for High Dimensional Micro-FE Models
Model Reduction for High Dimensional Micro-FE Models Rudnyi E. B., Korvink J. G. University of Freiburg, IMTEK-Department of Microsystems Engineering, {rudnyi,korvink}@imtek.uni-freiburg.de van Rietbergen
More informationAlgorithm 837: AMD, An Approximate Minimum Degree Ordering Algorithm
Algorithm 837: AMD, An Approximate Minimum Degree Ordering Algorithm PATRICK R. AMESTOY, ENSEEIHT-IRIT, and TIMOTHY A. DAVIS University of Florida and IAIN S. DUFF CERFACS and Rutherford Appleton Laboratory
More informationA Sparse Symmetric Indefinite Direct Solver for GPU Architectures
1 A Sparse Symmetric Indefinite Direct Solver for GPU Architectures JONATHAN D. HOGG, EVGUENI OVTCHINNIKOV, and JENNIFER A. SCOTT, STFC Rutherford Appleton Laboratory In recent years, there has been considerable
More informationCGO: G: Decoupling Symbolic from Numeric in Sparse Matrix Computations
CGO: G: Decoupling Symbolic from Numeric in Sparse Matrix Computations ABSTRACT Kazem Cheshmi PhD Student, Rutgers University kazem.ch@rutgers.edu Sympiler is a domain-specific code generator that optimizes
More informationGPU ACCELERATION OF CHOLMOD: BATCHING, HYBRID AND MULTI-GPU
April 4-7, 2016 Silicon Valley GPU ACCELERATION OF CHOLMOD: BATCHING, HYBRID AND MULTI-GPU Steve Rennich, Darko Stosic, Tim Davis, April 6, 2016 OBJECTIVE Direct sparse methods are among the most widely
More informationAccelerating GPU kernels for dense linear algebra
Accelerating GPU kernels for dense linear algebra Rajib Nath, Stanimire Tomov, and Jack Dongarra Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville {rnath1, tomov,
More informationModern GPUs (Graphics Processing Units)
Modern GPUs (Graphics Processing Units) Powerful data parallel computation platform. High computation density, high memory bandwidth. Relatively low cost. NVIDIA GTX 580 512 cores 1.6 Tera FLOPs 1.5 GB
More informationA class of communication-avoiding algorithms for solving general dense linear systems on CPU/GPU parallel machines
Available online at www.sciencedirect.com Procedia Computer Science 9 (2012 ) 17 26 International Conference on Computational Science, ICCS 2012 A class of communication-avoiding algorithms for solving
More informationOptimizing Memory-Bound Numerical Kernels on GPU Hardware Accelerators
Optimizing Memory-Bound Numerical Kernels on GPU Hardware Accelerators Ahmad Abdelfattah 1, Jack Dongarra 2, David Keyes 1 and Hatem Ltaief 3 1 KAUST Division of Mathematical and Computer Sciences and
More informationA MATLAB Interface to the GPU
A MATLAB Interface to the GPU Second Winter School Geilo, Norway André Rigland Brodtkorb SINTEF ICT Department of Applied Mathematics 2007-01-24 Outline 1 Motivation and previous
More informationTinySPICE Plus: Scaling Up Statistical SPICE Simulations on GPU Leveraging Shared-Memory Based Sparse Matrix Solution Techniques
TinySPICE Plus: Scaling Up Statistical SPICE Simulations on GPU Leveraging Shared-Memory Based Sparse Matrix Solution Techniques ABSTRACT Lengfei Han Department of ECE Michigan Technological University
More informationA GPU-based Approximate SVD Algorithm Blake Foster, Sridhar Mahadevan, Rui Wang
A GPU-based Approximate SVD Algorithm Blake Foster, Sridhar Mahadevan, Rui Wang University of Massachusetts Amherst Introduction Singular Value Decomposition (SVD) A: m n matrix (m n) U, V: orthogonal
More informationNative mesh ordering with Scotch 4.0
Native mesh ordering with Scotch 4.0 François Pellegrini INRIA Futurs Project ScAlApplix pelegrin@labri.fr Abstract. Sparse matrix reordering is a key issue for the the efficient factorization of sparse
More informationA Performance Prediction and Analysis Integrated Framework for SpMV on GPUs
Procedia Computer Science Volume 80, 2016, Pages 178 189 ICCS 2016. The International Conference on Computational Science A Performance Prediction and Analysis Integrated Framework for SpMV on GPUs Ping
More informationIntroduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620
Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved
More informationPARDISO - PARallel DIrect SOlver to solve SLAE on shared memory architectures
PARDISO - PARallel DIrect SOlver to solve SLAE on shared memory architectures Solovev S. A, Pudov S.G sergey.a.solovev@intel.com, sergey.g.pudov@intel.com Intel Xeon, Intel Core 2 Duo are trademarks of
More informationTR An Overview of NVIDIA Tegra K1 Architecture. Ang Li, Radu Serban, Dan Negrut
TR-2014-17 An Overview of NVIDIA Tegra K1 Architecture Ang Li, Radu Serban, Dan Negrut November 20, 2014 Abstract This paperwork gives an overview of NVIDIA s Jetson TK1 Development Kit and its Tegra K1
More informationSolving Dense Linear Systems on Graphics Processors
Solving Dense Linear Systems on Graphics Processors Sergio Barrachina Maribel Castillo Francisco Igual Rafael Mayo Enrique S. Quintana-Ortí High Performance Computing & Architectures Group Universidad
More informationOverview of MUMPS (A multifrontal Massively Parallel Solver)
Overview of MUMPS (A multifrontal Massively Parallel Solver) E. Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, J.-Y. L Excellent, T. Slavova, B. Uçar CERFACS, ENSEEIHT-IRIT, INRIA (LIP and LaBRI),
More informationUsing Mixed Precision for Sparse Matrix Computations to Enhance the Performance while Achieving 64-bit Accuracy
17 Using Mixed Precision for Sparse Matrix Computations to Enhance the Performance while Achieving 64-bit Accuracy ALFREDO BUTTARI ENS Lyon JACK DONGARRA University of Tennessee Knoxville Oak Ridge National
More informationA General Sparse Sparse Linear System Solver and Its Application in OpenFOAM
Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe A General Sparse Sparse Linear System Solver and Its Application in OpenFOAM Murat Manguoglu * Middle East Technical University,
More informationSCALABLE ALGORITHMS for solving large sparse linear systems of equations
SCALABLE ALGORITHMS for solving large sparse linear systems of equations CONTENTS Sparse direct solvers (multifrontal) Substructuring methods (hybrid solvers) Jacko Koster, Bergen Center for Computational
More informationDebunking the 100X GPU vs CPU Myth: An Evaluation of Throughput Computing on CPU and GPU
Debunking the 100X GPU vs CPU Myth: An Evaluation of Throughput Computing on CPU and GPU The myth 10x-1000x speed up on GPU vs CPU Papers supporting the myth: Microsoft: N. K. Govindaraju, B. Lloyd, Y.
More informationPACKAGE SPECIFICATION HSL 2013
MA41 PACKAGE SPECIFICATION HSL 2013 1 SUMMARY To solve a sparse unsymmetric system of linear equations. Given a square unsymmetric sparse matrix A of order n T and an n-vector b, this subroutine solves
More informationMAGMA. Matrix Algebra on GPU and Multicore Architectures
MAGMA Matrix Algebra on GPU and Multicore Architectures Innovative Computing Laboratory Electrical Engineering and Computer Science University of Tennessee Piotr Luszczek (presenter) web.eecs.utk.edu/~luszczek/conf/
More informationParallelization of the QR Decomposition with Column Pivoting Using Column Cyclic Distribution on Multicore and GPU Processors
Parallelization of the QR Decomposition with Column Pivoting Using Column Cyclic Distribution on Multicore and GPU Processors Andrés Tomás 1, Zhaojun Bai 1, and Vicente Hernández 2 1 Department of Computer
More informationMixed Precision Methods
Mixed Precision Methods Mixed precision, use the lowest precision required to achieve a given accuracy outcome " Improves runtime, reduce power consumption, lower data movement " Reformulate to find correction
More informationFrequency Scaling and Energy Efficiency regarding the Gauss-Jordan Elimination Scheme on OpenPower 8
Frequency Scaling and Energy Efficiency regarding the Gauss-Jordan Elimination Scheme on OpenPower 8 Martin Köhler Jens Saak 2 The Gauss-Jordan Elimination scheme is an alternative to the LU decomposition
More informationIterative Sparse Triangular Solves for Preconditioning
Euro-Par 2015, Vienna Aug 24-28, 2015 Iterative Sparse Triangular Solves for Preconditioning Hartwig Anzt, Edmond Chow and Jack Dongarra Incomplete Factorization Preconditioning Incomplete LU factorizations
More informationHYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER PROF. BRYANT PROF. KAYVON 15618: PARALLEL COMPUTER ARCHITECTURE
HYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER AVISHA DHISLE PRERIT RODNEY ADHISLE PRODNEY 15618: PARALLEL COMPUTER ARCHITECTURE PROF. BRYANT PROF. KAYVON LET S
More informationRedesigning Triangular Dense Matrix Computations on GPUs
Redesigning Triangular Dense Matrix Computations on GPUs Ali Charara, Hatem Ltaief, and David Keyes Extreme Computing Research Center, King Abdullah University of Science and Technology, Thuwal, Jeddah,
More informationTaking advantage of hybrid systems for sparse direct solvers via task-based runtimes
Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes Xavier Lacoste, Mathieu Faverge, Pierre Ramet, Samuel Thibault INRIA - IPB - LaBRI University of Bordeaux Talence, France
More informationDeveloping a High Performance Software Library with MPI and CUDA for Matrix Computations
Developing a High Performance Software Library with MPI and CUDA for Matrix Computations Bogdan Oancea 1, Tudorel Andrei 2 1 Nicolae Titulescu University of Bucharest, e-mail: bogdanoancea@univnt.ro, Calea
More informationSparse Matrices Direct methods
Sparse Matrices Direct methods Iain Duff STFC Rutherford Appleton Laboratory and CERFACS Summer School The 6th de Brùn Workshop. Linear Algebra and Matrix Theory: connections, applications and computations.
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 5: Sparse Linear Systems and Factorization Methods Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 18 Sparse
More informationDEVELOPMENT OF A RESTRICTED ADDITIVE SCHWARZ PRECONDITIONER FOR SPARSE LINEAR SYSTEMS ON NVIDIA GPU
INTERNATIONAL JOURNAL OF NUMERICAL ANALYSIS AND MODELING, SERIES B Volume 5, Number 1-2, Pages 13 20 c 2014 Institute for Scientific Computing and Information DEVELOPMENT OF A RESTRICTED ADDITIVE SCHWARZ
More informationThis article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and
This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution
More informationAccelerating GPU Kernels for Dense Linear Algebra
Accelerating GPU Kernels for Dense Linear Algebra Rajib Nath, Stan Tomov, and Jack Dongarra Innovative Computing Lab University of Tennessee, Knoxville July 9, 21 xgemm performance of CUBLAS-2.3 on GTX28
More informationAccelerating Sparse Cholesky Factorization on GPUs
Accelerating Sparse Cholesky Factorization on GPUs Submitted to IA 3 Workshop on Irregular Applications: Architectures & Algorithms, to be held Sunday, November 16, 2014, Colorado Convention Center, New
More informationSolving Sparse Linear Systems. Forward and backward substitution for solving lower or upper triangular systems
AMSC 6 /CMSC 76 Advanced Linear Numerical Analysis Fall 7 Direct Solution of Sparse Linear Systems and Eigenproblems Dianne P. O Leary c 7 Solving Sparse Linear Systems Assumed background: Gauss elimination
More informationData Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions
Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions Ziming Zhong Vladimir Rychkov Alexey Lastovetsky Heterogeneous Computing
More informationHigh-Performance Out-of-Core Sparse LU Factorization
High-Performance Out-of-Core Sparse LU Factorization John R. Gilbert Sivan Toledo Abstract We present an out-of-core sparse nonsymmetric LU-factorization algorithm with partial pivoting. We have implemented
More informationPERFORMANCE ANALYSIS OF LOAD FLOW COMPUTATION USING FPGA 1
PERFORMANCE ANALYSIS OF LOAD FLOW COMPUTATION USING FPGA 1 J. Johnson, P. Vachranukunkiet, S. Tiwari, P. Nagvajara, C. Nwankpa Drexel University Philadelphia, PA Abstract Full-AC load flow constitutes
More informationPARDISO Version Reference Sheet Fortran
PARDISO Version 5.0.0 1 Reference Sheet Fortran CALL PARDISO(PT, MAXFCT, MNUM, MTYPE, PHASE, N, A, IA, JA, 1 PERM, NRHS, IPARM, MSGLVL, B, X, ERROR, DPARM) 1 Please note that this version differs significantly
More informationParallel FFT Program Optimizations on Heterogeneous Computers
Parallel FFT Program Optimizations on Heterogeneous Computers Shuo Chen, Xiaoming Li Department of Electrical and Computer Engineering University of Delaware, Newark, DE 19716 Outline Part I: A Hybrid
More informationHigh performance matrix inversion of SPD matrices on graphics processors
High performance matrix inversion of SPD matrices on graphics processors Peter Benner, Pablo Ezzatti, Enrique S. Quintana-Ortí and Alfredo Remón Max-Planck-Institute for Dynamics of Complex Technical Systems
More informationHperformance. In hybrid architectures, more speed up is obtained by overlapping the computations of
Reviews of Literature ISSN:2347-2723 Impact Factor : 3.3754(UIF) Volume - 5 Issue - 5 DECEMBER - 2017 BALANCING THE LOAD IN HYBRID HIGH PERFORMANCE COMPUTING (HPC) SYSTEMS Shabnaz fathima Assistant Professor,
More informationA Fine-Grained Pipelined Implementation of LU Decomposition on SIMD Processors
A Fine-Grained Pipelined Implementation of LU Decomposition on SIMD Processors Kai Zhang, ShuMing Chen*, Wei Liu, and Xi Ning School of Computer, National University of Defense Technology #109, Deya Road,
More informationSolution of the Transport Equation Using Graphical Processing Units
Solution of the Transport Equation Using Graphical Processing Units Gil Gonçalves Brandão October - 2009 1 Introduction Computational Fluid Dynamics (CFD) always have struggled for faster computing resources
More informationNull space computation of sparse singular matrices with MUMPS
Null space computation of sparse singular matrices with MUMPS Xavier Vasseur (CERFACS) In collaboration with Patrick Amestoy (INPT-IRIT, University of Toulouse and ENSEEIHT), Serge Gratton (INPT-IRIT,
More informationState of Art and Project Proposals Intensive Computation
State of Art and Project Proposals Intensive Computation Annalisa Massini - 2015/2016 Today s lecture Project proposals on the following topics: Sparse Matrix- Vector Multiplication Tridiagonal Solvers
More informationApplications of Berkeley s Dwarfs on Nvidia GPUs
Applications of Berkeley s Dwarfs on Nvidia GPUs Seminar: Topics in High-Performance and Scientific Computing Team N2: Yang Zhang, Haiqing Wang 05.02.2015 Overview CUDA The Dwarfs Dynamic Programming Sparse
More informationSolving Large Regression Problems using an Ensemble of GPU-accelerated ELMs
Solving Large Regression Problems using an Ensemble of GPU-accelerated ELMs Mark van Heeswijk 1 and Yoan Miche 2 and Erkki Oja 1 and Amaury Lendasse 1 1 Helsinki University of Technology - Dept. of Information
More informationMulti-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation
Multi-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation 1 Cheng-Han Du* I-Hsin Chung** Weichung Wang* * I n s t i t u t e o f A p p l i e d M
More informationParallel Sparse LU Factorization on Different Message Passing Platforms
Parallel Sparse LU Factorization on Different Message Passing Platforms Kai Shen Department of Computer Science, University of Rochester Rochester, NY 1467, USA Abstract Several message passing-based parallel
More informationExploiting Mixed Precision Floating Point Hardware in Scientific Computations
Exploiting Mixed Precision Floating Point Hardware in Scientific Computations Alfredo BUTTARI a Jack DONGARRA a;b;d Jakub KURZAK a Julie LANGOU a Julien LANGOU c Piotr LUSZCZEK a and Stanimire TOMOV a
More informationAdvanced Numerical Techniques for Cluster Computing
Advanced Numerical Techniques for Cluster Computing Presented by Piotr Luszczek http://icl.cs.utk.edu/iter-ref/ Presentation Outline Motivation hardware Dense matrix calculations Sparse direct solvers
More informationGTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS. Kyle Spagnoli. Research EM Photonics 3/20/2013
GTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS Kyle Spagnoli Research Engineer @ EM Photonics 3/20/2013 INTRODUCTION» Sparse systems» Iterative solvers» High level benchmarks»
More informationParallelization of the QR Decomposition with Column Pivoting Using Column Cyclic Distribution on Multicore and GPU Processors
Parallelization of the QR Decomposition with Column Pivoting Using Column Cyclic Distribution on Multicore and GPU Processors Andrés Tomás 1, Zhaojun Bai 1, and Vicente Hernández 2 1 Department of Computer
More informationABSTRACT 1. INTRODUCTION. * phone ; fax ; emphotonics.com
CULA: Hybrid GPU Accelerated Linear Algebra Routines John R. Humphrey *, Daniel K. Price, Kyle E. Spagnoli, Aaron L. Paolini, Eric J. Kelmelis EM Photonics, Inc, 51 E Main St, Suite 203, Newark, DE, USA
More informationStructure-Adaptive Parallel Solution of Sparse Triangular Linear Systems
Structure-Adaptive Parallel Solution of Sparse Triangular Linear Systems Ehsan Totoni, Michael T. Heath, and Laxmikant V. Kale Department of Computer Science, University of Illinois at Urbana-Champaign
More informationGPU-based Parallel Reservoir Simulators
GPU-based Parallel Reservoir Simulators Zhangxin Chen 1, Hui Liu 1, Song Yu 1, Ben Hsieh 1 and Lei Shao 1 Key words: GPU computing, reservoir simulation, linear solver, parallel 1 Introduction Nowadays
More informationA Comparative Study on Exact Triangle Counting Algorithms on the GPU
A Comparative Study on Exact Triangle Counting Algorithms on the GPU Leyuan Wang, Yangzihao Wang, Carl Yang, John D. Owens University of California, Davis, CA, USA 31 st May 2016 L. Wang, Y. Wang, C. Yang,
More informationA Standard for Batching BLAS Operations
A Standard for Batching BLAS Operations Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester 5/8/16 1 API for Batching BLAS Operations We are proposing, as a community
More informationSpeedup Altair RADIOSS Solvers Using NVIDIA GPU
Innovation Intelligence Speedup Altair RADIOSS Solvers Using NVIDIA GPU Eric LEQUINIOU, HPC Director Hongwei Zhou, Senior Software Developer May 16, 2012 Innovation Intelligence ALTAIR OVERVIEW Altair
More information