Paralution & ViennaCL
|
|
- Emerald Banks
- 5 years ago
- Views:
Transcription
1 Paralution & ViennaCL Clemens Schiffer June 12, 2014 Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32
2 Introduction Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32
3 Idea of Paralution Package for iterative solvers/preconditioners Additional abstract layer between user s preferred program and varying hardware Code independent of platform and hardware backend Futureproof Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32
4 Installation No root access required Install using make/cmake Library & header based I had to specify the CUDA root directory cmake -D CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda.. Set environmental variable export LD_LIBRARY_PATH=$LD_LIBRARY_PATH: /paralution/build/lib Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32
5 Using Paralution: Basic Structure # include < paralution.hpp > using namespace paralution ; int main ( int argc, char * argv []) { init_ paralution (); info_ paralution (); // optional // your paralution code // goes here stop_ paralution (); return 0; Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32
6 Compilation and Linking Use g++ -O3 -Wall -I / paralution / build / inc -c main. cpp -o main.o g ++ -o main main. o -L / paralution / build / lib / - lparalution - lopencl Or modify your Makefile: CXXFLAGS += -I / paralution / build / inc LINKFLAGS += -L / paralution / build / lib / - lparalution - lopencl Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32
7 Info Paralution Output Number of CPU cores: 8 Host thread affinity policy - thread mapping on every core Number of GPU devices in the system: 1 PARALUTION ver PARALUTION platform is initialized Accelerator backend: GPU(CUDA) OpenMP threads:8 Selected GPU device: Device number: 0 Device name: GeForce GTX 680 totalglobalmem: 4095 MByte clockrate: compute capability: 3.0 ECCEnabled: Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32
8 Simple example: Apply Matrix to Vector LocalVector < double > x; LocalVector < double > y; LocalMatrix < double > mat ; mat. ReadFileMTX (" my_matrix. mtx "); x. ReadFileASCII (" my_vector. dat "); y. Allocate (" rhs ", mat. get_nrow ()); mat. Apply (x, & rhs ); Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32
9 Simple Example: On the Accelerator(GPU,... ) LocalVector < double > x; LocalVector < double > y; LocalMatrix < double > mat ; mat. ReadFileMTX (" my_matrix. mtx "); x. ReadFileASCII (" my_vector. dat "); y. Allocate (" rhs ", mat. get_nrow ()); mat. MoveToAccelerator (); x. MoveToAccelerator (); y. MoveToAccelerator (); mat. Apply (x, & rhs ); // perform rhs <- Ax Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32
10 Info Paralution Output Calling mat.info(); will produce the output: LocalMatrix name=l100.mtx; rows=10000; cols=10000; nnz=49600; prec=64bit; asm=no; format=csr; host backend={cpu(openmp)}; accelerator backend={opencl}; current=opencl If an operation can not be performed on the accelerator efficiently: *** warning: LocalMatrix::ConvertTo() is performed on the host Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32
11 Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32
12 Linear Solver: CG CG < LocalMatrix < double >, LocalVector < double >, double > ls; ls. SetOperator ( mat ); ls. Build (); ls. Solve (rhs, &x); // solve Ax = rhs Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32
13 Linear Solver: PCG CG < LocalMatrix < double >, LocalVector < double >, double > ls; Jacobi < LocalMatrix < double >, LocalVector < double >, double > p; ls. SetOperator ( mat ); ls. SetPreconditioner (p); ls. Build (); ls. Solve (rhs, &x); // solve Ax = rhs Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32
14 Custom Iteration Control CG < LocalMatrix < double >, LocalVector < double >, double > ls; Jacobi < LocalMatrix < double >, LocalVector < double >, double > p; ls. Init (1e -10, // abs_tol 1e -8, // rel_tol 1e+8, // div_tol 10000); // max_iter ls. SetOperator ( mat ); ls. SetPreconditioner (p); ls. Build (); ls. Solve (rhs, &x); // solve Ax = rhs Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32
15 Available Solvers/Preconditioners Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32
16 Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32
17 Switching the Backend No recompilation needed, just switch the library E.g. a second installation in /paralution_cl installed with cmake -DSUPPORT_CUDA=OFF -DSUPPORT_OCL=ON.. just changing export LD_LIBRARY_PATH=$LD_LIBRARY_PATH: /paralution_cl/build/lib will make the executable run using OpenCl. Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32
18 Matlab Plug-in Consists of an example file paralution_pcg.cpp That can be modified easily Compile into a MEX-file Can then be called in MATLAB as a normal function Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32
19 Matlab Plug-in: Details Required some extra attention: Finding mex: export PATH=$PATH:/usr/local/MATLAB/R2013a/bin Using an older compiler: sudo rm /usr/bin/gcc sudo rm /usr/bin/g++ sudo ln -s /usr/bin/gcc-4.4 /usr/bin/gcc sudo ln -s /usr/bin/g /usr/bin/g++ cd /usr/local/matlab/r2013a/sys/os/glnxa64 sudo unlink libstdc++.so.6 sudo ln -s /usr/lib/libstdc++.so.6 Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32
20 Other Plugins FORTRAN OpenFOAM Deal.II Elmer Hermes/Agros2D Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32
21 Pros Easy to use Portable Open Source Many precond/solvers Cons No MPI yet No stencils (No CUDA for CC < 2.0) In development Futureproof...? Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32
22 Introduction Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32
23 Idea of ViennaCL Linear algebra and iterative solvers/preconditioners Additional abstract layer between user s preferred program and varying hardware Code independent of platform and hardware backend: Header based Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32
24 Details More linear algebra: Dense matrices, slicing, extraction, etc. Compatible with ublas: just changing the namespace is enough Completely header based, no installation needed CMake only required to build examples highly recommended though Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32
25 Simple Example # include " viennacl / scalar. hpp " //... using namespace viennacl ; //... typedef float ScalarType ; matrix < ScalarType > vcl_a (N, M); vector < ScalarType > vcl_x (M); vector < ScalarType > vcl_rhs (N); std :: vector < ScalarType > stl_x ( M); // standard vectors ensure std :: vector < ScalarType > stl_a (N*M); // linear memory // -> fast_copy //.. fill with data fast_copy (&( stl_a [0]), &( stl_a [0]) + stl_a. size (), vcl_a ); fast_copy (&( stl_x [0]), &( stl_x [0]) + stl_x. size (), vcl_x ); vcl_rhs = linalg :: prod ( vcl_a, vcl_x ); Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32
26 Direct Solvers using namespace viennacl ; matrix < ScalarType > vcl_a vector < ScalarType > vcl_rhs ; //... fill with data // conjugate gradient : linalg :: lu_factorize ( vcl_a ); linalg :: lu_substitute ( vcl_a, vcl_rhs ); Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32
27 Iterative Solvers using namespace viennacl :: linalg ; //... compressed_ matrix < ScalarType > vcl_ matrix ; //... fill with data // conjugate gradient : vcl_ result = solve ( vcl_matrix, vcl_rhs, cg_tag () ); // BiCGStab : vcl_ result = solve ( vcl_matrix, vcl_rhs, bicgstab_ tag () ); // GMRES : vcl_ result = solve ( vcl_matrix, vcl_rhs, gmres_ tag () ); Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32
28 Iteration Control using namespace viennacl :: linalg ; //... compressed_ matrix < ScalarType > vcl_ matrix ; //... fill with data cg_tag custom_ cg (1e -10, 100); // relative tol, max_ iter // conjugate gradient : vcl_ result = solve ( vcl_matrix, vcl_rhs, custom_ cg ); cout << "No. of iters : " << custom_cg. iters () << endl ; cout << " Est. error : " << custom_cg. error () << endl ; // BiCGStab : vcl_ result = solve ( vcl_matrix, vcl_rhs, bicgstab_ tag () ); // GMRES : vcl_ result = solve ( vcl_matrix, vcl_rhs, gmres_ tag () ); Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32
29 Preconditioning using namespace viennacl :: linalg ; //... // Incomplete LU factorization with threshold ilut_ tag ilut_ config ( max_entries, // # nz row elements in L/ U drop_tol, // minimal value of L/ U true ); // level scheduling // subst paralell if possible ilut_precond < SparseMatrix > vcl_ ilut ( vcl_matrix, ilut_ config ); // PCG vcl_ result = solve ( vcl_matrix, vcl_rhs, cg_tag (), vcl_ ilut ); Other Preconditioners: ILU0, Block-ILU, Jacobi, Row Scaling; Experimental: AMG, SPAI Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32
30 pyviennacl import pyviennacl as p import numpy as np from scipy import io from my_ read_ mtx import read_ mtx # from util import read_mtx, read_ vector #B = io. mmread (" L20. mtx ") # not yet supported A = read_mtx ( " L20. mtx ", dtype =np. float64 ) b = p. Vector (20*20,1.0, dtype = np. float64 ) x = p. Vector (20*20,1.0, dtype = np. float64 ) tag = p. gmres_ tag ( tolerance = 1e -5, max_ iterations = 500, krylo # tag = p. cg_tag ( tolerance = 1e -8, max_ iterations = 150) x = p. solve (A, b, tag ) # Show some info print (" Num. iterations : %s" % tag. iters ) print (" Estimated error : %s" % tag. error ) print (" True error : %s" % (A*x-b). norm (2)) Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32
31 Pros Portable! Open Source More linear algebra (ublas, py) Cons No MPI In development Futureproof...? Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32
32 Thank you for your attention! Questions? Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32
PARALUTION - a Library for Iterative Sparse Methods on CPU and GPU
- a Library for Iterative Sparse Methods on CPU and GPU Dimitar Lukarski Division of Scientific Computing Department of Information Technology Uppsala Programming for Multicore Architectures Research Center
More informationiennacl GPU-accelerated Linear Algebra at the Convenience of the C++ Boost Libraries Karl Rupp
GPU-accelerated Linear Algebra at the Convenience of the C++ Boost Libraries Karl Rupp Mathematics and Computer Science Division Argonne National Laboratory based on previous work at Technische Universität
More informationHYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER PROF. BRYANT PROF. KAYVON 15618: PARALLEL COMPUTER ARCHITECTURE
HYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER AVISHA DHISLE PRERIT RODNEY ADHISLE PRODNEY 15618: PARALLEL COMPUTER ARCHITECTURE PROF. BRYANT PROF. KAYVON LET S
More informationGTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS. Kyle Spagnoli. Research EM Photonics 3/20/2013
GTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS Kyle Spagnoli Research Engineer @ EM Photonics 3/20/2013 INTRODUCTION» Sparse systems» Iterative solvers» High level benchmarks»
More informationStructure-preserving Smoothing for Seismic Amplitude Data by Anisotropic Diffusion using GPGPU
GPU Technology Conference 2016 April, 4-7 San Jose, CA, USA Structure-preserving Smoothing for Seismic Amplitude Data by Anisotropic Diffusion using GPGPU Joner Duarte jduartejr@tecgraf.puc-rio.br Outline
More informationOn Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators
On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators Karl Rupp, Barry Smith rupp@mcs.anl.gov Mathematics and Computer Science Division Argonne National Laboratory FEMTEC
More informationMulti-GPU simulations in OpenFOAM with SpeedIT technology.
Multi-GPU simulations in OpenFOAM with SpeedIT technology. Attempt I: SpeedIT GPU-based library of iterative solvers for Sparse Linear Algebra and CFD. Current version: 2.2. Version 1.0 in 2008. CMRS format
More informationEfficient Multi-GPU CUDA Linear Solvers for OpenFOAM
Efficient Multi-GPU CUDA Linear Solvers for OpenFOAM Alexander Monakov, amonakov@ispras.ru Institute for System Programming of Russian Academy of Sciences March 20, 2013 1 / 17 Problem Statement In OpenFOAM,
More informationAmgX 2.0: Scaling toward CORAL Joe Eaton, November 19, 2015
AmgX 2.0: Scaling toward CORAL Joe Eaton, November 19, 2015 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for the future 2 AmgX Fast, scalable linear solvers, emphasis on iterative
More informationViennaCL and PETSc Tutorial
ViennaCL and PETSc Tutorial Karl Rupp rupp@mcs.anl.gov Mathematics and Computer Science Division Argonne National Laboratory FEMTEC 2013 May 23th, 2013 Part 1 iennacl Vienna Computing Library http://viennacl.sourceforge.net/
More informationOpenMP and MPI parallelization
OpenMP and MPI parallelization Gundolf Haase Institute for Mathematics and Scientific Computing University of Graz, Austria Chile, Jan. 2015 OpenMP for our example OpenMP generation in code Determine matrix
More informationCUDA Accelerated Compute Libraries. M. Naumov
CUDA Accelerated Compute Libraries M. Naumov Outline Motivation Why should you use libraries? CUDA Toolkit Libraries Overview of performance CUDA Proprietary Libraries Address specific markets Third Party
More informationSparse Matrices. This means that for increasing problem size the matrices become sparse and sparser. O. Rheinbach, TU Bergakademie Freiberg
Sparse Matrices Many matrices in computing only contain a very small percentage of nonzeros. Such matrices are called sparse ( dünn besetzt ). Often, an upper bound on the number of nonzeros in a row can
More informationBlock Distributed Schur Complement Preconditioners for CFD Computations on Many-Core Systems
Block Distributed Schur Complement Preconditioners for CFD Computations on Many-Core Systems Dr.-Ing. Achim Basermann, Melven Zöllner** German Aerospace Center (DLR) Simulation- and Software Technology
More informationLessons Learned in Developing the Linear Algebra Library ViennaCL
Lessons Learned in Developing the Linear Algebra Library ViennaCL Florian Rudolf 1, Karl Rupp 1,2, Josef Weinbub 1 http://karlrupp.net/ 1 Institute for Microelectronics 2 Institute for Analysis and Scientific
More informationAccelerating the Conjugate Gradient Algorithm with GPUs in CFD Simulations
Accelerating the Conjugate Gradient Algorithm with GPUs in CFD Simulations Hartwig Anzt 1, Marc Baboulin 2, Jack Dongarra 1, Yvan Fournier 3, Frank Hulsemann 3, Amal Khabou 2, and Yushan Wang 2 1 University
More informationReport of Linear Solver Implementation on GPU
Report of Linear Solver Implementation on GPU XIANG LI Abstract As the development of technology and the linear equation solver is used in many aspects such as smart grid, aviation and chemical engineering,
More informationEigen Tutorial. CS2240 Interactive Computer Graphics
CS2240 Interactive Computer Graphics CS2240 Interactive Computer Graphics Introduction Eigen is an open-source linear algebra library implemented in C++. It s fast and well-suited for a wide range of tasks,
More informationnag sparse nsym sol (f11dec)
f11 Sparse Linear Algebra f11dec nag sparse nsym sol (f11dec) 1. Purpose nag sparse nsym sol (f11dec) solves a real sparse nonsymmetric system of linear equations, represented in coordinate storage format,
More informationFASP User Guide. FASP Developers. Version 2.0.5
FASP User Guide FASP Developers Version 2.0.5 Contents Contents 1 1 Introduction 3 1.1 General description..................................... 3 1.2 Roadmap: from basics to complex applications.....................
More informationOpenFOAM + GPGPU. İbrahim Özküçük
OpenFOAM + GPGPU İbrahim Özküçük Outline GPGPU vs CPU GPGPU plugins for OpenFOAM Overview of Discretization CUDA for FOAM Link (cufflink) Cusp & Thrust Libraries How Cufflink Works Performance data of
More informationAccelerated ANSYS Fluent: Algebraic Multigrid on a GPU. Robert Strzodka NVAMG Project Lead
Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU Robert Strzodka NVAMG Project Lead A Parallel Success Story in Five Steps 2 Step 1: Understand Application ANSYS Fluent Computational Fluid Dynamics
More informationfspai-1.0 Factorized Sparse Approximate Inverse Preconditioner
fspai-1.0 Factorized Sparse Approximate Inverse Preconditioner Thomas Huckle Matous Sedlacek 2011 08 01 Technische Universität München Research Unit Computer Science V Scientific Computing in Computer
More informationPerformance of deal.ii on a node
Performance of deal.ii on a node Bruno Turcksin Texas A&M University, Dept. of Mathematics Bruno Turcksin Deal.II on a node 1/37 Outline 1 Introduction 2 Architecture 3 Paralution 4 Other Libraries 5 Conclusions
More informationNEW ADVANCES IN GPU LINEAR ALGEBRA
GTC 2012: NEW ADVANCES IN GPU LINEAR ALGEBRA Kyle Spagnoli EM Photonics 5/16/2012 QUICK ABOUT US» HPC/GPU Consulting Firm» Specializations in:» Electromagnetics» Image Processing» Fluid Dynamics» Linear
More informationDon t reinvent the wheel. BLAS LAPACK Intel Math Kernel Library
Libraries Don t reinvent the wheel. Specialized math libraries are likely faster. BLAS: Basic Linear Algebra Subprograms LAPACK: Linear Algebra Package (uses BLAS) http://www.netlib.org/lapack/ to download
More informationIntroduction to PETSc KSP, PC. CS595, Fall 2010
Introduction to PETSc KSP, PC CS595, Fall 2010 1 Linear Solution Main Routine PETSc Solve Ax = b Linear Solvers (KSP) PC Application Initialization Evaluation of A and b Post- Processing User code PETSc
More informationDue Date: See Blackboard
Source File: ~/2315/45/lab45.(C CPP cpp c++ cc cxx cp) Input: under control of main function Output: under control of main function Value: 4 Integer data is usually represented in a single word on a computer.
More informationfspai-1.1 Factorized Sparse Approximate Inverse Preconditioner
fspai-1.1 Factorized Sparse Approximate Inverse Preconditioner Thomas Huckle Matous Sedlacek 2011 09 10 Technische Universität München Research Unit Computer Science V Scientific Computing in Computer
More informationNAG Library Function Document nag_sparse_nsym_sol (f11dec)
f11 Large Scale Linear Systems NAG Library Function Document nag_sparse_nsym_sol () 1 Purpose nag_sparse_nsym_sol () solves a real sparse nonsymmetric system of linear equations, represented in coordinate
More informationHighly Parallel Multigrid Solvers for Multicore and Manycore Processors
Highly Parallel Multigrid Solvers for Multicore and Manycore Processors Oleg Bessonov (B) Institute for Problems in Mechanics of the Russian Academy of Sciences, 101, Vernadsky Avenue, 119526 Moscow, Russia
More informationEfficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs
Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs Markus Geveler, Dirk Ribbrock, Dominik Göddeke, Peter Zajac, Stefan Turek Institut für Angewandte Mathematik TU Dortmund,
More informationApplication of GPU technology to OpenFOAM simulations
Application of GPU technology to OpenFOAM simulations Jakub Poła, Andrzej Kosior, Łukasz Miroslaw jakub.pola@vratis.com, www.vratis.com Wroclaw, Poland Agenda Motivation Partial acceleration SpeedIT OpenFOAM
More informationPart VI. Scientific Computing in Python. Alfredo Parra : Scripting with Python Compact Max-PlanckMarch 6-10,
Part VI Scientific Computing in Python Compact Course @ Max-PlanckMarch 6-10, 2017 63 Doing maths in Python Standard sequence types (list, tuple,... ) Can be used as arrays Can contain different types
More informationNAG Fortran Library Routine Document F11DSF.1
NAG Fortran Library Routine Document Note: before using this routine, please read the Users Note for your implementation to check the interpretation of bold italicised terms and other implementation-dependent
More informationVIENNACL - LINEAR ALGEBRA LIBRARY FOR MULTI- AND MANY-CORE ARCHITECTURES
VIENNACL - LINEAR ALGEBRA LIBRARY FOR MULTI- AND MANY-CORE ARCHITECTURES KARL RUPP, PHILIPPE TILLET, FLORIAN RUDOLF, JOSEF WEINBUB, ANDREAS MORHAMMER, TIBOR GRASSER, ANSGAR JÜNGEL, SIEGFRIED SELBERHERR
More informationEFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI
EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI 1 Akshay N. Panajwar, 2 Prof.M.A.Shah Department of Computer Science and Engineering, Walchand College of Engineering,
More informationOpenFOAM on GPUs. 3rd Northern germany OpenFoam User meeting. Institute of Scientific Computing. September 24th 2015
OpenFOAM on GPUs 3rd Northern germany OpenFoam User meeting September 24th 2015 Haus der Wissenschaften, Braunschweig Overview HPC on GPGPUs OpenFOAM on GPUs 2013 OpenFOAM on GPUs 2015 BiCGstab/IDR(s)
More informationAMGCL Documentation. Release post189. Denis Demidov
AMGCL Documentation Release 1.2.0.post189 Denis Demidov Nov 13, 2018 Contents 1 Contents: 3 1.1 Getting started.............................................. 3 1.2 Components...............................................
More informationMasterpraktikum - Scientific Computing, High Performance Computing
Masterpraktikum - Scientific Computing, High Performance Computing Message Passing Interface (MPI) and CG-method Michael Bader Alexander Heinecke Technische Universität München, Germany Outline MPI Hello
More informationPreconditioning Linear Systems Arising from Graph Laplacians of Complex Networks
Preconditioning Linear Systems Arising from Graph Laplacians of Complex Networks Kevin Deweese 1 Erik Boman 2 1 Department of Computer Science University of California, Santa Barbara 2 Scalable Algorithms
More informationSparse Matrix Libraries in C++ for High Performance. Architectures. ferent sparse matrix data formats in order to best
Sparse Matrix Libraries in C++ for High Performance Architectures Jack Dongarra xz, Andrew Lumsdaine, Xinhui Niu Roldan Pozo z, Karin Remington x x Oak Ridge National Laboratory z University oftennessee
More informationGPU-based Parallel Reservoir Simulators
GPU-based Parallel Reservoir Simulators Zhangxin Chen 1, Hui Liu 1, Song Yu 1, Ben Hsieh 1 and Lei Shao 1 Key words: GPU computing, reservoir simulation, linear solver, parallel 1 Introduction Nowadays
More informationACCELERATING CFD AND RESERVOIR SIMULATIONS WITH ALGEBRAIC MULTI GRID Chris Gottbrath, Nov 2016
ACCELERATING CFD AND RESERVOIR SIMULATIONS WITH ALGEBRAIC MULTI GRID Chris Gottbrath, Nov 2016 Challenges What is Algebraic Multi-Grid (AMG)? AGENDA Why use AMG? When to use AMG? NVIDIA AmgX Results 2
More informationIterative Methods for Linear Systems
Iterative Methods for Linear Systems 1 the method of Jacobi derivation of the formulas cost and convergence of the algorithm a Julia function 2 Gauss-Seidel Relaxation an iterative method for solving linear
More informationIterative Sparse Triangular Solves for Preconditioning
Euro-Par 2015, Vienna Aug 24-28, 2015 Iterative Sparse Triangular Solves for Preconditioning Hartwig Anzt, Edmond Chow and Jack Dongarra Incomplete Factorization Preconditioning Incomplete LU factorizations
More informationLecture 15: More Iterative Ideas
Lecture 15: More Iterative Ideas David Bindel 15 Mar 2010 Logistics HW 2 due! Some notes on HW 2. Where we are / where we re going More iterative ideas. Intro to HW 3. More HW 2 notes See solution code!
More informationVIENNACL LINEAR ALGEBRA LIBRARY FOR MULTI- AND MANY-CORE ARCHITECTURES
SIAM J. SCI. COMPUT. Vol. 38, No. 5, pp. S412 S439 c 216 Society for Industrial and Applied Mathematics VIENNACL LINEAR ALGEBRA LIBRARY FOR MULTI- AND MANY-CORE ARCHITECTURES KARL RUPP, PHILIPPE TILLET,
More informationTHE DEVELOPMENT OF THE POTENTIAL AND ACADMIC PROGRAMMES OF WROCLAW UNIVERISTY OF TECH- NOLOGY ITERATIVE LINEAR SOLVERS
ITERATIVE LIEAR SOLVERS. Objectives The goals of the laboratory workshop are as follows: to learn basic properties of iterative methods for solving linear least squares problems, to study the properties
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 20: Sparse Linear Systems; Direct Methods vs. Iterative Methods Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 26
More informationIntroduction to Supercomputing
Introduction to Supercomputing TMA4280 Introduction to development tools 0.1 Development tools During this course, only the make tool, compilers, and the GIT tool will be used for the sake of simplicity:
More informationA parallel direct/iterative solver based on a Schur complement approach
A parallel direct/iterative solver based on a Schur complement approach Gene around the world at CERFACS Jérémie Gaidamour LaBRI and INRIA Bordeaux - Sud-Ouest (ScAlApplix project) February 29th, 2008
More informationPerformance of Implicit Solver Strategies on GPUs
9. LS-DYNA Forum, Bamberg 2010 IT / Performance Performance of Implicit Solver Strategies on GPUs Prof. Dr. Uli Göhner DYNAmore GmbH Stuttgart, Germany Abstract: The increasing power of GPUs can be used
More informationOP2 C++ User s Manual
OP2 C++ User s Manual Mike Giles, Gihan R. Mudalige, István Reguly December 2013 1 Contents 1 Introduction 4 2 Overview 5 3 OP2 C++ API 8 3.1 Initialisation and termination routines..........................
More informationPorting the NAS-NPB Conjugate Gradient Benchmark to CUDA. NVIDIA Corporation
Porting the NAS-NPB Conjugate Gradient Benchmark to CUDA NVIDIA Corporation Outline! Overview of CG benchmark! Overview of CUDA Libraries! CUSPARSE! CUBLAS! Porting Sequence! Algorithm Analysis! Data/Code
More informationLab 2: Pointers. //declare a pointer variable ptr1 pointing to x. //change the value of x to 10 through ptr1
Lab 2: Pointers 1. Goals Further understanding of pointer variables Passing parameters to functions by address (pointers) and by references Creating and using dynamic arrays Combing pointers, structures
More informationBDDCML. solver library based on Multi-Level Balancing Domain Decomposition by Constraints copyright (C) Jakub Šístek version 1.
BDDCML solver library based on Multi-Level Balancing Domain Decomposition by Constraints copyright (C) 2010-2012 Jakub Šístek version 1.3 Jakub Šístek i Table of Contents 1 Introduction.....................................
More informationSparse Linear Systems
1 Sparse Linear Systems Rob H. Bisseling Mathematical Institute, Utrecht University Course Introduction Scientific Computing February 22, 2018 2 Outline Iterative solution methods 3 A perfect bipartite
More informationIBM Research. IBM Research Report
RC 24398 (W0711-017) November 5, 2007 (Last update: June 28, 2018) Computer Science/Mathematics IBM Research Report WSMP: Watson Sparse Matrix Package Part III iterative solution of sparse systems Version
More informationCSE 599 I Accelerated Computing - Programming GPUS. Parallel Pattern: Sparse Matrices
CSE 599 I Accelerated Computing - Programming GPUS Parallel Pattern: Sparse Matrices Objective Learn about various sparse matrix representations Consider how input data affects run-time performance of
More informationLittle Motivation Outline Introduction OpenMP Architecture Working with OpenMP Future of OpenMP End. OpenMP. Amasis Brauch German University in Cairo
OpenMP Amasis Brauch German University in Cairo May 4, 2010 Simple Algorithm 1 void i n c r e m e n t e r ( short a r r a y ) 2 { 3 long i ; 4 5 for ( i = 0 ; i < 1000000; i ++) 6 { 7 a r r a y [ i ]++;
More informationProfiling and Parallelizing with the OpenACC Toolkit OpenACC Course: Lecture 2 October 15, 2015
Profiling and Parallelizing with the OpenACC Toolkit OpenACC Course: Lecture 2 October 15, 2015 Oct 1: Introduction to OpenACC Oct 6: Office Hours Oct 15: Profiling and Parallelizing with the OpenACC Toolkit
More informationOn the Parallel Solution of Sparse Triangular Linear Systems. M. Naumov* San Jose, CA May 16, 2012 *NVIDIA
On the Parallel Solution of Sparse Triangular Linear Systems M. Naumov* San Jose, CA May 16, 2012 *NVIDIA Why Is This Interesting? There exist different classes of parallel problems Embarrassingly parallel
More informationComputing with vectors and matrices in C++
CS319: Scientific Computing (with C++) Computing with vectors and matrices in C++ Week 7: 9am and 4pm, 22 Feb 2017 1 Introduction 2 Solving linear systems 3 Jacobi s Method 4 Implementation 5 Vectors 6
More informationJCudaMP: OpenMP/Java on CUDA
JCudaMP: OpenMP/Java on CUDA Georg Dotzler, Ronald Veldema, Michael Klemm Programming Systems Group Martensstraße 3 91058 Erlangen Motivation Write once, run anywhere - Java Slogan created by Sun Microsystems
More informationAndrew V. Knyazev and Merico E. Argentati (speaker)
1 Andrew V. Knyazev and Merico E. Argentati (speaker) Department of Mathematics and Center for Computational Mathematics University of Colorado at Denver 2 Acknowledgement Supported by Lawrence Livermore
More informationANSI C. Data Analysis in Geophysics Demián D. Gómez November 2013
ANSI C Data Analysis in Geophysics Demián D. Gómez November 2013 ANSI C Standards published by the American National Standards Institute (1983-1989). Initially developed by Dennis Ritchie between 1969
More informationSeparate Compilation of Multi-File Programs
1 About Compiling What most people mean by the phrase "compiling a program" is actually two separate steps in the creation of that program. The rst step is proper compilation. Compilation is the translation
More informationOPEN MP and MPI on Kingspeak chpc cluster
OPEN MP and MPI on Kingspeak chpc cluster Command to compile the code with openmp and mpi /uufs/kingspeak.peaks/sys/pkg/openmpi/std_intel/bin/mpicc -o hem hemhotlz.c -I /uufs/kingspeak.peaks/sys/pkg/openmpi/std_intel/include
More informationTomonori Kouya Shizuoka Institute of Science and Technology Toyosawa, Fukuroi, Shizuoka Japan. October 5, 2018
arxiv:1411.2377v1 [math.na] 10 Nov 2014 A Highly Efficient Implementation of Multiple Precision Sparse Matrix-Vector Multiplication and Its Application to Product-type Krylov Subspace Methods Tomonori
More informationThis offering is not approved or endorsed by OpenCFD Limited, the producer of the OpenFOAM software and owner of the OPENFOAM and OpenCFD trade marks.
Disclaimer This offering is not approved or endorsed by OpenCFD Limited, the producer of the OpenFOAM software and owner of the OPENFOAM and OpenCFD trade marks. Introductory OpenFOAM Course From 8 th
More informationCG solver assignment
CG solver assignment David Bindel Nikos Karampatziakis 3/16/2010 Contents 1 Introduction 1 2 Solver parameters 2 3 Preconditioned CG 3 4 3D Laplace operator 4 5 Preconditioners for the Laplacian 5 5.1
More informationResearch Article A PETSc-Based Parallel Implementation of Finite Element Method for Elasticity Problems
Mathematical Problems in Engineering Volume 2015, Article ID 147286, 7 pages http://dx.doi.org/10.1155/2015/147286 Research Article A PETSc-Based Parallel Implementation of Finite Element Method for Elasticity
More informationOPENFOAM ON GPUS USING AMGX
OPENFOAM ON GPUS USING AMGX Thilina Rathnayake Sanath Jayasena Mahinsasa Narayana ABSTRACT Field Operation and Manipulation (OpenFOAM) is a free, open-source, feature-rich Computational Fluid Dynamics
More informationSkePU 2 User Guide For the preview release
SkePU 2 User Guide For the preview release August Ernstsson October 20, 2016 Contents 1 Introduction 3 2 License 3 3 Authors and Maintainers 3 3.1 Acknowledgements............................... 3 4 Dependencies
More informationCS2141 Software Development using C/C++ C++ Basics
CS2141 Software Development using C/C++ C++ Basics Integers Basic Types Can be short, long, or just plain int C++ does not define the size of them other than short
More informationLarge Displacement Optical Flow & Applications
Large Displacement Optical Flow & Applications Narayanan Sundaram, Kurt Keutzer (Parlab) In collaboration with Thomas Brox (University of Freiburg) Michael Tao (University of California Berkeley) Parlab
More informationOptimising the Mantevo benchmark suite for multi- and many-core architectures
Optimising the Mantevo benchmark suite for multi- and many-core architectures Simon McIntosh-Smith Department of Computer Science University of Bristol 1 Bristol's rich heritage in HPC The University of
More informationEfficient AMG on Hybrid GPU Clusters. ScicomP Jiri Kraus, Malte Förster, Thomas Brandes, Thomas Soddemann. Fraunhofer SCAI
Efficient AMG on Hybrid GPU Clusters ScicomP 2012 Jiri Kraus, Malte Förster, Thomas Brandes, Thomas Soddemann Fraunhofer SCAI Illustration: Darin McInnis Motivation Sparse iterative solvers benefit from
More informationFigure 6.1: Truss topology optimization diagram.
6 Implementation 6.1 Outline This chapter shows the implementation details to optimize the truss, obtained in the ground structure approach, according to the formulation presented in previous chapters.
More informationAn Example of Porting PETSc Applications to Heterogeneous Platforms with OpenACC
An Example of Porting PETSc Applications to Heterogeneous Platforms with OpenACC Pi-Yueh Chuang The George Washington University Fernanda S. Foertter Oak Ridge National Laboratory Goal Develop an OpenACC
More informationPerformance Strategies for Parallel Mathematical Libraries Based on Historical Knowledgebase
Performance Strategies for Parallel Mathematical Libraries Based on Historical Knowledgebase CScADS workshop 29 Eduardo Cesar, Anna Morajko, Ihab Salawdeh Universitat Autònoma de Barcelona Objective Mathematical
More informationME964 High Performance Computing for Engineering Applications
ME964 High Performance Computing for Engineering Applications Outlining Midterm Projects Topic 3: GPU-based FEA Topic 4: GPU Direct Solver for Sparse Linear Algebra March 01, 2011 Dan Negrut, 2011 ME964
More informationHIPS : a parallel hybrid direct/iterative solver based on a Schur complement approach
HIPS : a parallel hybrid direct/iterative solver based on a Schur complement approach Mini-workshop PHyLeaS associated team J. Gaidamour, P. Hénon July 9, 28 HIPS : an hybrid direct/iterative solver /
More informationi486 or Pentium Windows 3.1 PVM MasPar Thinking Machine CM-5 Intel Paragon IBM SP2 telnet/ftp or rlogin
Hidehiko Hasegawa 1983: University of Library and Information Science, the smallest National university March 1994: Visiting Researcher at icl, University of Tennessee, Knoxville 1994-95 in Japan: a bad
More informationCPS343 Parallel and High Performance Computing Project 1 Spring 2018
CPS343 Parallel and High Performance Computing Project 1 Spring 2018 Assignment Write a program using OpenMP to compute the estimate of the dominant eigenvalue of a matrix Due: Wednesday March 21 The program
More informationPyAMG. Algebraic Multigrid Solvers in Python Nathan Bell, Nvidia Luke Olson, University of Illinois Jacob Schroder, University of Colorado at Boulder
PyAMG Algebraic Multigrid Solvers in Python Nathan Bell, Nvidia Luke Olson, University of Illinois Jacob Schroder, University of Colorado at Boulder Copper Mountain 2011 Boot disc For Mac Press and hold
More informationNonsymmetric Problems. Abstract. The eect of a threshold variant TPABLO of the permutation
Threshold Ordering for Preconditioning Nonsymmetric Problems Michele Benzi 1, Hwajeong Choi 2, Daniel B. Szyld 2? 1 CERFACS, 42 Ave. G. Coriolis, 31057 Toulouse Cedex, France (benzi@cerfacs.fr) 2 Department
More informationHPC with PGI and Scalasca
HPC with PGI and Scalasca Stefan Rosenberger Supervisor: Univ.-Prof. Dipl.-Ing. Dr. Gundolf Haase Institut für Mathematik und wissenschaftliches Rechnen Universität Graz May 28, 2015 Stefan Rosenberger
More informationS0432 NEW IDEAS FOR MASSIVELY PARALLEL PRECONDITIONERS
S0432 NEW IDEAS FOR MASSIVELY PARALLEL PRECONDITIONERS John R Appleyard Jeremy D Appleyard Polyhedron Software with acknowledgements to Mark A Wakefield Garf Bowen Schlumberger Outline of Talk Reservoir
More informationTopic Notes: Message Passing Interface (MPI)
Computer Science 400 Parallel Processing Siena College Fall 2008 Topic Notes: Message Passing Interface (MPI) The Message Passing Interface (MPI) was created by a standards committee in the early 1990
More informationPragma-based GPU Programming and HMPP Workbench. Scott Grauer-Gray
Pragma-based GPU Programming and HMPP Workbench Scott Grauer-Gray Pragma-based GPU programming Write programs for GPU processing without (directly) using CUDA/OpenCL Place pragmas to drive processing on
More informationComputational Graphics: Lecture 15 SpMSpM and SpMV, or, who cares about complexity when we have a thousand processors?
Computational Graphics: Lecture 15 SpMSpM and SpMV, or, who cares about complexity when we have a thousand processors? The CVDLab Team Francesco Furiani Tue, April 3, 2014 ROMA TRE UNIVERSITÀ DEGLI STUDI
More informationAccelerated Test Execution Using GPUs
Accelerated Test Execution Using GPUs Vanya Yaneva Supervisors: Ajitha Rajan, Christophe Dubach Mathworks May 27, 2016 The Problem Software testing is time consuming Functional testing The Problem Software
More informationECE 574 Cluster Computing Lecture 10
ECE 574 Cluster Computing Lecture 10 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 1 October 2015 Announcements Homework #4 will be posted eventually 1 HW#4 Notes How granular
More informationDepartment of Informatics V. HPC-Lab. Session 4: MPI, CG M. Bader, A. Breuer. Alex Breuer
HPC-Lab Session 4: MPI, CG M. Bader, A. Breuer Meetings Date Schedule 10/13/14 Kickoff 10/20/14 Q&A 10/27/14 Presentation 1 11/03/14 H. Bast, Intel 11/10/14 Presentation 2 12/01/14 Presentation 3 12/08/14
More informationDue Date: See Blackboard
Source File: ~/2315/11/lab11.(C CPP cpp c++ cc cxx cp) Input: Under control of main function Output: Under control of main function Value: 1 The purpose of this assignment is to become more familiar with
More informationThe performances of R GPU implementations of the GMRES method. Bogdan Oancea University of Bucharest
The performances of R GPU implementations of the GMRES method Bogdan Oancea University of Bucharest bogdan.oancea@faa.unibuc.ro Richard Pospisil Palacky University of Olomouc richard.pospisil@upol.cz Abstract
More informationA Comparison of Algebraic Multigrid Preconditioners using Graphics Processing Units and Multi-Core Central Processing Units
A Comparison of Algebraic Multigrid Preconditioners using Graphics Processing Units and Multi-Core Central Processing Units Markus Wagner, Karl Rupp,2, Josef Weinbub Institute for Microelectronics, TU
More informationEnhanced Oil Recovery simulation Performances on New Hybrid Architectures
Renewable energies Eco-friendly production Innovative transport Eco-efficient processes Sustainable resources Enhanced Oil Recovery simulation Performances on New Hybrid Architectures A. Anciaux, J-M.
More information