Direct Algorithms for Sparse Schur Complements and Inverses
|
|
- Kathleen Allison
- 5 years ago
- Views:
Transcription
1 Direct Algorithms for Sparse Schur Complements and Inverses Dr. Ryan Chilton MyraMath
2 Outline Examine some less common sparse direct algorithms: Partial linear solution. Schur complements. Sampling the inverse operator. Apply them as frontends for low-rank skeletonization: Cross approximation. Range estimation. Ritz projection. Motivations: fast direct solvers for FE-BI s and FE-DDM s.
3 Refresher: Factor A=LL T Reorder: Left(0), Right(1), Separator(2). A 01 = A 10 = all zero! Right looking. Factor A 00 /A 11, schur downdate A 22, factor. FEM mesh: Reordered matrix: Algorithm steps: A 00 0 A 02 Factor A 22 0 A 11 A 12 Schur Downdate A 22 A 20 A 21 A 22 Solve A 20 Solve A 21 Separator induces these zeroes. They can t fill-in! Factor A 00 Factor A 11 Left (0) Separator (2) Right (1) Note A 00 and A 11 also sparse, apply idea recursively. Leads to a tree of operations, eliminating from bottom up.
4 Selected profiling data. Example problem under study: I x J x K brick (N = IJK) N= 80 3 = 512K N=100 3 = 1M N=128 3 = 2.1M GEMM 12sec 42s 161s 35s 3D=O(n 1.87 ) 2D=O(n 1.53 ) 1D=O(n 1.08 ) 105sec 367s 1559s 405s Intel E x8=16 Xeon at 2.4GHz, MKL 48 3 Discrete graph laplacian (7-point): well understood spectrum. Structured grid: easy to reorder using nested dissection.
5 Partial solution x=r it A -1 R j b In plain english: only b(j) nonzero, only x(i) is needed. = Many engineering QoI s use only boundary-valued b and x. Lx=b L T x=b Solve Partial Solve j j i i O(n 4/3 ) time, like x=a -1 b. Only O(n 2/3 ) space per RHS, not O(n).
6 Schur complement S=B T A -1 B Concept: form saddle system of A and B, then quit early. Arise from FE-BI hybrids, eg scattering from apertures.
7 Sampling the inverse Z(i,j), Z=A -1 Closely related to Schur complement, Z(i,j) = R it A -1 R j Arise in FETI/DDM, iterate/exchange fields at boundaries. Scatter, solve, gather. Scatter, solve, gather. Tabulating Z(i,j) opens up reuse/preconditioning options.
8 Cross Approximating Z(i,j) [1/2] Alternately sample row/column with largest error modulus. log 10 (Z-UV T ) Estimated Error Actual Error SVD(Z) Key idea: partialsolve() can efficiently extract rows/columns: c = Z([i],j) = solver.partialsolve([i],j,x=1.0,'left') r = Z(i,[j]) = solver.partialsolve(i,[j],x=1.0,'right')
9 Cross Approximating Z(i,j) [2/2] Beats solver.inverse() at large N, especially at low rank/tol. 8 digits 6 digits 4 digits But in parallel the gap narrows, BLAS3 vs BLAS1 effects.
10 (error) Range estimation of Z(i,j) [1/2] Apply action of Z to random vectors X, form image Y=ZX. If Z has rapidly decaying σ s, Y probably spans range(z). // Find Q = span(z) X = rand(z.cols,k) Y = Z.apply(X) [Q,R,π] = QR(Y,0) k=4 k=8 x SVD(Z) SVD(UV T ) Pass 1, Pass 2.. // Build k-svd from Q W = Z.apply(Q) [U,Ʃ,V] = svd(w,0) Z (Q U) Ʃ (V) k=16 k=32 Key idea: partialsolve() can efficiently apply Y=Z(i,j) X: Y = Z([i],[j])*X = solver.partialsolve([i],[j],x,'left')
11 Range estimation of Z(i,j) [2/2] All the same problem instances as before (sizes,shapes). 8 digits 6 digits 4 digits Availability of all forcing data up front leads to speedup. Can be faster than parallel solver.inverse(), even at modest N.
12 Ritz Projection of Z(i,j) [1/3] What about approximating more than just one block? (B)lock (L)ow (R)ank (H)eirarchical Matrix Optimization(BLR)/amortization(H) opportunities do exist.
13 All of exterior, partitioned into (leaf) groups. G3 G2 G1 Y(3,0) = colspan Z(3,0) G0 Y(0,3) = colspan Z(0,3) = rowspan Z(3,0) Ritz Projection of Z(i,j) [2/3] First pass: find row/column spans using fat partialsolve() k k k k R = Y(3,0) T Z(3,0) Y(0,3) k k = T [schur] R = solver.schur(y 30,Y 03 ) [U,Ʃ,V] = svd(r03) Z 30 (Y 30 U) Ʃ (V Y 03 ) X Y Second pass: Ritz projection using solver.schur(), k-svd
14 Ritz Projection of Z(i,j) [3/3] Fill an H-matrix representation of Z restricted to boundary. 1385sec Factor Form Y [partialsolve] Form B [schur,qr,svd] Form Z [inverse] Algorithm quickly furnishes all (admissible) blocks. Can form H-matrix of S=B T A -1 B with a few minor changes.
15 Wrapping Up Examined several uncommon sparse direct algorithms: Partial linear solution: x=r it A -1 R j b (sparse b, sifted x) Schur complements: B T A -1 B, B T A -1 C, all sparse Sampling the inverse operator: Z(i,j) = R i A -1 R j Used them as frontends for low-rank/skeletonization: Cross approximation: partialsolve() can extract row/column Range estimation: partialsolve() can apply Z(i,j) quickly Ritz projection: schur()+partialsolve(), amortization over blocks Essential tools for FEBI/DDM methods (sparsity+lowrank).
16 Contact: MyraMath: sparse factor/solve/schur/inverse/partialsolve. MyraKL: BLAS/LAPACK API for MyraMath, or use MKL. Free software (GPL), or dual license
Using multifrontal hierarchically solver and HPC systems for 3D Helmholtz problem
Using multifrontal hierarchically solver and HPC systems for 3D Helmholtz problem Sergey Solovyev 1, Dmitry Vishnevsky 1, Hongwei Liu 2 Institute of Petroleum Geology and Geophysics SB RAS 1 EXPEC ARC,
More informationA parallel direct/iterative solver based on a Schur complement approach
A parallel direct/iterative solver based on a Schur complement approach Gene around the world at CERFACS Jérémie Gaidamour LaBRI and INRIA Bordeaux - Sud-Ouest (ScAlApplix project) February 29th, 2008
More informationGPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE)
GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE) NATALIA GIMELSHEIN ANSHUL GUPTA STEVE RENNICH SEID KORIC NVIDIA IBM NVIDIA NCSA WATSON SPARSE MATRIX PACKAGE (WSMP) Cholesky, LDL T, LU factorization
More informationStudy and implementation of computational methods for Differential Equations in heterogeneous systems. Asimina Vouronikoy - Eleni Zisiou
Study and implementation of computational methods for Differential Equations in heterogeneous systems Asimina Vouronikoy - Eleni Zisiou Outline Introduction Review of related work Cyclic Reduction Algorithm
More informationGPU ACCELERATION OF CHOLMOD: BATCHING, HYBRID AND MULTI-GPU
April 4-7, 2016 Silicon Valley GPU ACCELERATION OF CHOLMOD: BATCHING, HYBRID AND MULTI-GPU Steve Rennich, Darko Stosic, Tim Davis, April 6, 2016 OBJECTIVE Direct sparse methods are among the most widely
More informationPARDISO - PARallel DIrect SOlver to solve SLAE on shared memory architectures
PARDISO - PARallel DIrect SOlver to solve SLAE on shared memory architectures Solovev S. A, Pudov S.G sergey.a.solovev@intel.com, sergey.g.pudov@intel.com Intel Xeon, Intel Core 2 Duo are trademarks of
More informationPARDISO Version Reference Sheet Fortran
PARDISO Version 5.0.0 1 Reference Sheet Fortran CALL PARDISO(PT, MAXFCT, MNUM, MTYPE, PHASE, N, A, IA, JA, 1 PERM, NRHS, IPARM, MSGLVL, B, X, ERROR, DPARM) 1 Please note that this version differs significantly
More informationIntel Math Kernel Library (Intel MKL) BLAS. Victor Kostin Intel MKL Dense Solvers team manager
Intel Math Kernel Library (Intel MKL) BLAS Victor Kostin Intel MKL Dense Solvers team manager Intel MKL BLAS/Sparse BLAS Original ( dense ) BLAS available from www.netlib.org Additionally Intel MKL provides
More informationAdvanced Computer Graphics
G22.2274 001, Fall 2009 Advanced Computer Graphics Project details and tools 1 Project Topics Computer Animation Geometric Modeling Computational Photography Image processing 2 Optimization All projects
More informationParallel resolution of sparse linear systems by mixing direct and iterative methods
Parallel resolution of sparse linear systems by mixing direct and iterative methods Phyleas Meeting, Bordeaux J. Gaidamour, P. Hénon, J. Roman, Y. Saad LaBRI and INRIA Bordeaux - Sud-Ouest (ScAlApplix
More informationIssues In Implementing The Primal-Dual Method for SDP. Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM
Issues In Implementing The Primal-Dual Method for SDP Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM 87801 borchers@nmt.edu Outline 1. Cache and shared memory parallel computing concepts.
More informationAdvanced Computer Graphics
G22.2274 001, Fall 2010 Advanced Computer Graphics Project details and tools 1 Projects Details of each project are on the website under Projects Please review all the projects and come see me if you would
More informationSUPERFAST MULTIFRONTAL METHOD FOR STRUCTURED LINEAR SYSTEMS OF EQUATIONS
SUPERFAS MULIFRONAL MEHOD FOR SRUCURED LINEAR SYSEMS OF EQUAIONS S. CHANDRASEKARAN, M. GU, X. S. LI, AND J. XIA Abstract. In this paper we develop a fast direct solver for discretized linear systems using
More informationMulti-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation
Multi-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation 1 Cheng-Han Du* I-Hsin Chung** Weichung Wang* * I n s t i t u t e o f A p p l i e d M
More informationSparse Matrices Direct methods
Sparse Matrices Direct methods Iain Duff STFC Rutherford Appleton Laboratory and CERFACS Summer School The 6th de Brùn Workshop. Linear Algebra and Matrix Theory: connections, applications and computations.
More informationLecture 27: Fast Laplacian Solvers
Lecture 27: Fast Laplacian Solvers Scribed by Eric Lee, Eston Schweickart, Chengrun Yang November 21, 2017 1 How Fast Laplacian Solvers Work We want to solve Lx = b with L being a Laplacian matrix. Recall
More informationNEW ADVANCES IN GPU LINEAR ALGEBRA
GTC 2012: NEW ADVANCES IN GPU LINEAR ALGEBRA Kyle Spagnoli EM Photonics 5/16/2012 QUICK ABOUT US» HPC/GPU Consulting Firm» Specializations in:» Electromagnetics» Image Processing» Fluid Dynamics» Linear
More informationHigh performance 2D Discrete Fourier Transform on Heterogeneous Platforms. Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli
High performance 2D Discrete Fourier Transform on Heterogeneous Platforms Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli Motivation Fourier Transform widely used in Physics, Astronomy, Engineering
More informationHFSS Hybrid Finite Element and Integral Equation Solver for Large Scale Electromagnetic Design and Simulation
HFSS Hybrid Finite Element and Integral Equation Solver for Large Scale Electromagnetic Design and Simulation Laila Salman, PhD Technical Services Specialist laila.salman@ansys.com 1 Agenda Overview of
More informationOptimizing the operations with sparse matrices on Intel architecture
Optimizing the operations with sparse matrices on Intel architecture Gladkikh V. S. victor.s.gladkikh@intel.com Intel Xeon, Intel Itanium are trademarks of Intel Corporation in the U.S. and other countries.
More informationSparse matrices, graphs, and tree elimination
Logistics Week 6: Friday, Oct 2 1. I will be out of town next Tuesday, October 6, and so will not have office hours on that day. I will be around on Monday, except during the SCAN seminar (1:25-2:15);
More informationA GPU Sparse Direct Solver for AX=B
1 / 25 A GPU Sparse Direct Solver for AX=B Jonathan Hogg, Evgueni Ovtchinnikov, Jennifer Scott* STFC Rutherford Appleton Laboratory 26 March 2014 GPU Technology Conference San Jose, California * Thanks
More informationA comparison of parallel rank-structured solvers
A comparison of parallel rank-structured solvers François-Henry Rouet Livermore Software Technology Corporation, Lawrence Berkeley National Laboratory Joint work with: - LSTC: J. Anton, C. Ashcraft, C.
More informationPartitioning and Partitioning Tools. Tim Barth NASA Ames Research Center Moffett Field, California USA
Partitioning and Partitioning Tools Tim Barth NASA Ames Research Center Moffett Field, California 94035-00 USA 1 Graph/Mesh Partitioning Why do it? The graph bisection problem What are the standard heuristic
More informationParallel FEM Computation and Multilevel Graph Partitioning Xing Cai
Parallel FEM Computation and Multilevel Graph Partitioning Xing Cai Simula Research Laboratory Overview Parallel FEM computation how? Graph partitioning why? The multilevel approach to GP A numerical example
More informationHIPS : a parallel hybrid direct/iterative solver based on a Schur complement approach
HIPS : a parallel hybrid direct/iterative solver based on a Schur complement approach Mini-workshop PHyLeaS associated team J. Gaidamour, P. Hénon July 9, 28 HIPS : an hybrid direct/iterative solver /
More informationThe Fermi GPU and HPC Application Breakthroughs
The Fermi GPU and HPC Application Breakthroughs Peng Wang, PhD HPC Developer Technology Group Stan Posey HPC Industry Development NVIDIA, Santa Clara, CA, USA NVIDIA Corporation 2009 Overview GPU Computing:
More informationConstruction and application of hierarchical matrix preconditioners
University of Iowa Iowa Research Online Theses and Dissertations 2008 Construction and application of hierarchical matrix preconditioners Fang Yang University of Iowa Copyright 2008 Fang Yang This dissertation
More informationSecond Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering
State of the art distributed parallel computational techniques in industrial finite element analysis Second Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering Ajaccio, France
More informationHigh Performance Computing: Tools and Applications
High Performance Computing: Tools and Applications Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology Lecture 15 Numerically solve a 2D boundary value problem Example:
More informationSimulation Advances. Antenna Applications
Simulation Advances for RF, Microwave and Antenna Applications Presented by Martin Vogel, PhD Application Engineer 1 Overview Advanced Integrated Solver Technologies Finite Arrays with Domain Decomposition
More informationTowards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers
Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers Markus Geveler, Dirk Ribbrock, Dominik Göddeke, Peter Zajac, Stefan Turek Institut für Angewandte Mathematik TU Dortmund,
More informationAn Approximate Singular Value Decomposition of Large Matrices in Julia
An Approximate Singular Value Decomposition of Large Matrices in Julia Alexander J. Turner 1, 1 Harvard University, School of Engineering and Applied Sciences, Cambridge, MA, USA. In this project, I implement
More informationScalable, Hybrid-Parallel Multiscale Methods using DUNE
MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE R. Milk S. Kaulmann M. Ohlberger December 1st 2014 Outline MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 2 /28 Abstraction
More informationBrief notes on setting up semi-high performance computing environments. July 25, 2014
Brief notes on setting up semi-high performance computing environments July 25, 2014 1 We have two different computing environments for fitting demanding models to large space and/or time data sets. 1
More informationGTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS. Kyle Spagnoli. Research EM Photonics 3/20/2013
GTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS Kyle Spagnoli Research Engineer @ EM Photonics 3/20/2013 INTRODUCTION» Sparse systems» Iterative solvers» High level benchmarks»
More informationESPRESO ExaScale PaRallel FETI Solver. Hybrid FETI Solver Report
ESPRESO ExaScale PaRallel FETI Solver Hybrid FETI Solver Report Lubomir Riha, Tomas Brzobohaty IT4Innovations Outline HFETI theory from FETI to HFETI communication hiding and avoiding techniques our new
More informationINTEL MKL Vectorized Compact routines
INTEL MKL Vectorized Compact routines Mesut Meterelliyoz, Peter Caday, Timothy B. Costa, Kazushige Goto, Louise Huot, Sarah Knepper, Arthur Araujo Mitrano, Shane Story 2018 BLIS RETREAT 09/17/2018 OUTLINE
More informationEfficient Multi-GPU CUDA Linear Solvers for OpenFOAM
Efficient Multi-GPU CUDA Linear Solvers for OpenFOAM Alexander Monakov, amonakov@ispras.ru Institute for System Programming of Russian Academy of Sciences March 20, 2013 1 / 17 Problem Statement In OpenFOAM,
More informationEmpirical Complexity of Laplacian Linear Solvers: Discussion
Empirical Complexity of Laplacian Linear Solvers: Discussion Erik Boman, Sandia National Labs Kevin Deweese, UC Santa Barbara John R. Gilbert, UC Santa Barbara 1 Simons Institute Workshop on Fast Algorithms
More informationPersistent Homology and Nested Dissection
Persistent Homology and Nested Dissection Don Sheehy University of Connecticut joint work with Michael Kerber and Primoz Skraba A Topological Data Analysis Pipeline A Topological Data Analysis Pipeline
More informationA GPU-based Approximate SVD Algorithm Blake Foster, Sridhar Mahadevan, Rui Wang
A GPU-based Approximate SVD Algorithm Blake Foster, Sridhar Mahadevan, Rui Wang University of Massachusetts Amherst Introduction Singular Value Decomposition (SVD) A: m n matrix (m n) U, V: orthogonal
More informationJournal of Computational Physics
Journal of Computational Physics 23 (22) 34 338 Contents lists available at SciVerse ScienceDirect Journal of Computational Physics journal homepage: www.elsevier.com/locate/jcp A fast direct solver for
More informationShape Optimizing Load Balancing for Parallel Adaptive Numerical Simulations Using MPI
Parallel Adaptive Institute of Theoretical Informatics Karlsruhe Institute of Technology (KIT) 10th DIMACS Challenge Workshop, Feb 13-14, 2012, Atlanta 1 Load Balancing by Repartitioning Application: Large
More informationDynamic Geometry Processing
Dynamic Geometry Processing EG 2012 Tutorial Will Chang, Hao Li, Niloy Mitra, Mark Pauly, Michael Wand Tutorial: Dynamic Geometry Processing 1 Articulated Global Registration Introduction and Overview
More informationSimulation Advances for RF, Microwave and Antenna Applications
Simulation Advances for RF, Microwave and Antenna Applications Bill McGinn Application Engineer 1 Overview Advanced Integrated Solver Technologies Finite Arrays with Domain Decomposition Hybrid solving:
More informationSparse Multifrontal Performance Gains via NVIDIA GPU January 16, 2009
Sparse Multifrontal Performance Gains via NVIDIA GPU January 16, 2009 Dan l Pierce, PhD, MBA, CEO & President AAI Joint with: Yukai Hung, Chia-Chi Liu, Yao-Hung Tsai, Weichung Wang, and David Yu Access
More informationParallelizing Adaptive Triangular Grids with Refinement Trees and Space Filling Curves
Parallelizing Adaptive Triangular Grids with Refinement Trees and Space Filling Curves Daniel Butnaru butnaru@in.tum.de Advisor: Michael Bader bader@in.tum.de JASS 08 Computational Science and Engineering
More informationAnalysis and Optimization of Power Consumption in the Iterative Solution of Sparse Linear Systems on Multi-core and Many-core Platforms
Analysis and Optimization of Power Consumption in the Iterative Solution of Sparse Linear Systems on Multi-core and Many-core Platforms H. Anzt, V. Heuveline Karlsruhe Institute of Technology, Germany
More informationChallenges and Advances in Parallel Sparse Matrix-Matrix Multiplication
Challenges and Advances in Parallel Sparse Matrix-Matrix Multiplication Aydin Buluc John R. Gilbert University of California, Santa Barbara ICPP 2008 September 11, 2008 1 Support: DOE Office of Science,
More informationIN OUR LAST HOMEWORK, WE SOLVED LARGE
Y OUR HOMEWORK H A SSIGNMENT Editor: Dianne P. O Leary, oleary@cs.umd.edu FAST SOLVERS AND SYLVESTER EQUATIONS: BOTH SIDES NOW By Dianne P. O Leary IN OUR LAST HOMEWORK, WE SOLVED LARGE SPARSE SYSTEMS
More informationTechnical Report. OSUBMI-TR-2009-n02/ BU-CE Hypergraph Partitioning-Based Fill-Reducing Ordering
Technical Report OSUBMI-TR-2009-n02/ BU-CE-0904 Hypergraph Partitioning-Based Fill-Reducing Ordering Ümit V. Çatalyürek, Cevdet Aykanat and Enver Kayaaslan April 2009 The Ohio State University Department
More informationSDPA Project: Solving Large-scale Semidefinite Programs
SDPA Project: Solving Large-scale Semidefinite Programs 12/May/2008 Katsuki Fujisawa Chuo University, Japan Masakazu Kojima & Mituhiro Fukuda & Kazuhide Nakata & Makoto Yamashita Tokyo Institute of Technology,
More informationME964 High Performance Computing for Engineering Applications
ME964 High Performance Computing for Engineering Applications Outlining Midterm Projects Topic 3: GPU-based FEA Topic 4: GPU Direct Solver for Sparse Linear Algebra March 01, 2011 Dan Negrut, 2011 ME964
More informationEfficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs
Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs Markus Geveler, Dirk Ribbrock, Dominik Göddeke, Peter Zajac, Stefan Turek Institut für Angewandte Mathematik TU Dortmund,
More informationHigh performance computing and the simplex method
Julian Hall, Qi Huangfu and Edmund Smith School of Mathematics University of Edinburgh 12th April 2011 The simplex method for LP Not... Nonlinear programming... Integer programming... Stochastic programming......
More informationParallel Numerics, WT 2013/ Introduction
Parallel Numerics, WT 2013/2014 1 Introduction page 1 of 122 Scope Revise standard numerical methods considering parallel computations! Required knowledge Numerics Parallel Programming Graphs Literature
More informationFormal Loop Merging for Signal Transforms
Formal Loop Merging for Signal Transforms Franz Franchetti Yevgen S. Voronenko Markus Püschel Department of Electrical & Computer Engineering Carnegie Mellon University This work was supported by NSF through
More informationSolvers and partitioners in the Bacchus project
1 Solvers and partitioners in the Bacchus project 11/06/2009 François Pellegrini INRIA-UIUC joint laboratory The Bacchus team 2 Purpose Develop and validate numerical methods and tools adapted to problems
More informationBipartite Edge Prediction via Transductive Learning over Product Graphs
Bipartite Edge Prediction via Transductive Learning over Product Graphs Hanxiao Liu, Yiming Yang School of Computer Science, Carnegie Mellon University July 8, 2015 ICML 2015 Bipartite Edge Prediction
More informationFunction call overhead benchmarks with MATLAB, Octave, Python, Cython and C
Function call overhead benchmarks with MATLAB, Octave, Python, Cython and C André Gaul September 23, 2018 arxiv:1202.2736v1 [cs.pl] 13 Feb 2012 1 Background In many applications a function has to be called
More informationHARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES. Cliff Woolley, NVIDIA
HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES Cliff Woolley, NVIDIA PREFACE This talk presents a case study of extracting parallelism in the UMT2013 benchmark for 3D unstructured-mesh
More informationSpectral Clustering on Handwritten Digits Database
October 6, 2015 Spectral Clustering on Handwritten Digits Database Danielle dmiddle1@math.umd.edu Advisor: Kasso Okoudjou kasso@umd.edu Department of Mathematics University of Maryland- College Park Advance
More informationHow to solve QPs with 10 9 variables
School of Mathematics T H E O U N I V E R S I T Y H F R G E D I N B U How to solve QPs with 10 9 variables Andreas Grothey, Jacek Gondzio Numerical Analysis 2005, Dundee 1 OOPS (Object Oriented Parallel
More informationCS7540 Spectral Algorithms, Spring 2017 Lecture #2. Matrix Tree Theorem. Presenter: Richard Peng Jan 12, 2017
CS7540 Spectral Algorithms, Spring 2017 Lecture #2 Matrix Tree Theorem Presenter: Richard Peng Jan 12, 2017 DISCLAIMER: These notes are not necessarily an accurate representation of what I said during
More informationNative mesh ordering with Scotch 4.0
Native mesh ordering with Scotch 4.0 François Pellegrini INRIA Futurs Project ScAlApplix pelegrin@labri.fr Abstract. Sparse matrix reordering is a key issue for the the efficient factorization of sparse
More informationNumerical Algorithms
Chapter 10 Slide 464 Numerical Algorithms Slide 465 Numerical Algorithms In textbook do: Matrix multiplication Solving a system of linear equations Slide 466 Matrices A Review An n m matrix Column a 0,0
More informationMAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures
MAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures Stan Tomov Innovative Computing Laboratory University of Tennessee, Knoxville OLCF Seminar Series, ORNL June 16, 2010
More informationShared memory parallel algorithms in Scotch 6
Shared memory parallel algorithms in Scotch 6 François Pellegrini EQUIPE PROJET BACCHUS Bordeaux Sud-Ouest 29/05/2012 Outline of the talk Context Why shared-memory parallelism in Scotch? How to implement
More informationIntel Math Kernel Library (Intel MKL) Sparse Solvers. Alexander Kalinkin Intel MKL developer, Victor Kostin Intel MKL Dense Solvers team manager
Intel Math Kernel Library (Intel MKL) Sparse Solvers Alexander Kalinkin Intel MKL developer, Victor Kostin Intel MKL Dense Solvers team manager Copyright 3, Intel Corporation. All rights reserved. Sparse
More informationAccelerating Finite Element Analysis in MATLAB with Parallel Computing
MATLAB Digest Accelerating Finite Element Analysis in MATLAB with Parallel Computing By Vaishali Hosagrahara, Krishna Tamminana, and Gaurav Sharma The Finite Element Method is a powerful numerical technique
More informationObject Classification Problem
HIERARCHICAL OBJECT CATEGORIZATION" Gregory Griffin and Pietro Perona. Learning and Using Taxonomies For Fast Visual Categorization. CVPR 2008 Marcin Marszalek and Cordelia Schmid. Constructing Category
More informationANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation
ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation Ray Browell nvidia Technology Theater SC12 1 2012 ANSYS, Inc. nvidia Technology Theater SC12 HPC Revolution Recent
More informationTHE procedure used to solve inverse problems in areas such as Electrical Impedance
12TH INTL. CONFERENCE IN ELECTRICAL IMPEDANCE TOMOGRAPHY (EIT 2011), 4-6 MAY 2011, UNIV. OF BATH 1 Scaling the EIT Problem Alistair Boyle, Andy Adler, Andrea Borsic Abstract There are a number of interesting
More informationGraph Partitioning for High-Performance Scientific Simulations. Advanced Topics Spring 2008 Prof. Robert van Engelen
Graph Partitioning for High-Performance Scientific Simulations Advanced Topics Spring 2008 Prof. Robert van Engelen Overview Challenges for irregular meshes Modeling mesh-based computations as graphs Static
More informationLecture 4: Principles of Parallel Algorithm Design (part 3)
Lecture 4: Principles of Parallel Algorithm Design (part 3) 1 Exploratory Decomposition Decomposition according to a search of a state space of solutions Example: the 15-puzzle problem Determine any sequence
More informationNull space computation of sparse singular matrices with MUMPS
Null space computation of sparse singular matrices with MUMPS Xavier Vasseur (CERFACS) In collaboration with Patrick Amestoy (INPT-IRIT, University of Toulouse and ENSEEIHT), Serge Gratton (INPT-IRIT,
More informationToward a supernodal sparse direct solver over DAG runtimes
Toward a supernodal sparse direct solver over DAG runtimes HOSCAR 2013, Bordeaux X. Lacoste Xavier LACOSTE HiePACS team Inria Bordeaux Sud-Ouest November 27, 2012 Guideline Context and goals About PaStiX
More informationLecture 4 - Real-time Ray Tracing
INFOMAGR Advanced Graphics Jacco Bikker - November 2017 - February 2018 Lecture 4 - Real-time Ray Tracing Welcome! I x, x = g(x, x ) ε x, x + න S ρ x, x, x I x, x dx Today s Agenda: Introduction Ray Distributions
More informationParallel FFT Program Optimizations on Heterogeneous Computers
Parallel FFT Program Optimizations on Heterogeneous Computers Shuo Chen, Xiaoming Li Department of Electrical and Computer Engineering University of Delaware, Newark, DE 19716 Outline Part I: A Hybrid
More informationImproving Linear Algebra Computation on NUMA platforms through auto-tuned tuned nested parallelism
Improving Linear Algebra Computation on NUMA platforms through auto-tuned tuned nested parallelism Javier Cuenca, Luis P. García, Domingo Giménez Parallel Computing Group University of Murcia, SPAIN parallelum
More information1.2 Numerical Solutions of Flow Problems
1.2 Numerical Solutions of Flow Problems DIFFERENTIAL EQUATIONS OF MOTION FOR A SIMPLIFIED FLOW PROBLEM Continuity equation for incompressible flow: 0 Momentum (Navier-Stokes) equations for a Newtonian
More informationExam Design and Analysis of Algorithms for Parallel Computer Systems 9 15 at ÖP3
UMEÅ UNIVERSITET Institutionen för datavetenskap Lars Karlsson, Bo Kågström och Mikael Rännar Design and Analysis of Algorithms for Parallel Computer Systems VT2009 June 2, 2009 Exam Design and Analysis
More informationOptimizing Parallel Sparse Matrix-Vector Multiplication by Corner Partitioning
Optimizing Parallel Sparse Matrix-Vector Multiplication by Corner Partitioning Michael M. Wolf 1,2, Erik G. Boman 2, and Bruce A. Hendrickson 3 1 Dept. of Computer Science, University of Illinois at Urbana-Champaign,
More informationIntroduction to parallel Computing
Introduction to parallel Computing VI-SEEM Training Paschalis Paschalis Korosoglou Korosoglou (pkoro@.gr) (pkoro@.gr) Outline Serial vs Parallel programming Hardware trends Why HPC matters HPC Concepts
More informationGeometric Modeling Assignment 5: Shape Deformation
Geometric Modeling Assignment 5: Shape Deformation Acknowledgements: Olga Diamanti, Julian Panetta Shape Deformation Step 1: Select and Deform Handle Regions Draw vertex selection with mouse H 2 Move one
More informationExtracting Information from Complex Networks
Extracting Information from Complex Networks 1 Complex Networks Networks that arise from modeling complex systems: relationships Social networks Biological networks Distinguish from random networks uniform
More informationIntel Math Kernel Library
Intel Math Kernel Library Release 7.0 March 2005 Intel MKL Purpose Performance, performance, performance! Intel s scientific and engineering floating point math library Initially only basic linear algebra
More informationGraph and Hypergraph Partitioning for Parallel Computing
Graph and Hypergraph Partitioning for Parallel Computing Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology June 29, 2016 Graph and hypergraph partitioning References:
More informationOn Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators
On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators Karl Rupp, Barry Smith rupp@mcs.anl.gov Mathematics and Computer Science Division Argonne National Laboratory FEMTEC
More informationGPU Cluster Computing for FEM
GPU Cluster Computing for FEM Dominik Göddeke Sven H.M. Buijssen, Hilmar Wobker and Stefan Turek Angewandte Mathematik und Numerik TU Dortmund, Germany dominik.goeddeke@math.tu-dortmund.de GPU Computing
More informationInternational Conference on Computational Science (ICCS 2017)
International Conference on Computational Science (ICCS 2017) Exploiting Hybrid Parallelism in the Kinematic Analysis of Multibody Systems Based on Group Equations G. Bernabé, J. C. Cano, J. Cuenca, A.
More informationSolving Sparse Linear Systems. Forward and backward substitution for solving lower or upper triangular systems
AMSC 6 /CMSC 76 Advanced Linear Numerical Analysis Fall 7 Direct Solution of Sparse Linear Systems and Eigenproblems Dianne P. O Leary c 7 Solving Sparse Linear Systems Assumed background: Gauss elimination
More informationGPU Acceleration of the Longwave Rapid Radiative Transfer Model in WRF using CUDA Fortran. G. Ruetsch, M. Fatica, E. Phillips, N.
GPU Acceleration of the Longwave Rapid Radiative Transfer Model in WRF using CUDA Fortran G. Ruetsch, M. Fatica, E. Phillips, N. Juffa Outline WRF and RRTM Previous Work CUDA Fortran Features RRTM in CUDA
More informationProgrammable Shaders for Deformation Rendering
Programmable Shaders for Deformation Rendering Carlos D. Correa, Deborah Silver Rutgers, The State University of New Jersey Motivation We present a different way of obtaining mesh deformation. Not a modeling,
More informationA Standard for Batching BLAS Operations
A Standard for Batching BLAS Operations Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester 5/8/16 1 API for Batching BLAS Operations We are proposing, as a community
More informationPorting Scientific Research Codes to GPUs with CUDA Fortran: Incompressible Fluid Dynamics using the Immersed Boundary Method
Porting Scientific Research Codes to GPUs with CUDA Fortran: Incompressible Fluid Dynamics using the Immersed Boundary Method Josh Romero, Massimiliano Fatica - NVIDIA Vamsi Spandan, Roberto Verzicco -
More informationIntroduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe
Introduction to Query Processing and Query Optimization Techniques Outline Translating SQL Queries into Relational Algebra Algorithms for External Sorting Algorithms for SELECT and JOIN Operations Algorithms
More informationOperator Upscaling and Adjoint State Method
Operator Upscaling and Adjoint State Method Tetyana Vdovina, William Symes The Rice Inversion Project Rice University vdovina@rice.edu February 0, 009 Motivation Ultimate Goal: Use 3D elastic upscaling
More informationLarge-scale Structural Analysis Using General Sparse Matrix Technique
Large-scale Structural Analysis Using General Sparse Matrix Technique Yuan-Sen Yang 1), Shang-Hsien Hsieh 1), Kuang-Wu Chou 1), and I-Chau Tsai 1) 1) Department of Civil Engineering, National Taiwan University,
More information