Paralution & ViennaCL

Size: px

Start display at page:

Download "Paralution & ViennaCL"

Emerald Banks
5 years ago
Views:

1 Paralution & ViennaCL Clemens Schiffer June 12, 2014 Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32

2 Introduction Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32

3 Idea of Paralution Package for iterative solvers/preconditioners Additional abstract layer between user s preferred program and varying hardware Code independent of platform and hardware backend Futureproof Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32

4 Installation No root access required Install using make/cmake Library & header based I had to specify the CUDA root directory cmake -D CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda.. Set environmental variable export LD_LIBRARY_PATH=$LD_LIBRARY_PATH: /paralution/build/lib Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32

5 Using Paralution: Basic Structure # include < paralution.hpp > using namespace paralution ; int main ( int argc, char * argv []) { init_ paralution (); info_ paralution (); // optional // your paralution code // goes here stop_ paralution (); return 0; Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32

6 Compilation and Linking Use g++ -O3 -Wall -I / paralution / build / inc -c main. cpp -o main.o g ++ -o main main. o -L / paralution / build / lib / - lparalution - lopencl Or modify your Makefile: CXXFLAGS += -I / paralution / build / inc LINKFLAGS += -L / paralution / build / lib / - lparalution - lopencl Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32

7 Info Paralution Output Number of CPU cores: 8 Host thread affinity policy - thread mapping on every core Number of GPU devices in the system: 1 PARALUTION ver PARALUTION platform is initialized Accelerator backend: GPU(CUDA) OpenMP threads:8 Selected GPU device: Device number: 0 Device name: GeForce GTX 680 totalglobalmem: 4095 MByte clockrate: compute capability: 3.0 ECCEnabled: Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32

8 Simple example: Apply Matrix to Vector LocalVector < double > x; LocalVector < double > y; LocalMatrix < double > mat ; mat. ReadFileMTX (" my_matrix. mtx "); x. ReadFileASCII (" my_vector. dat "); y. Allocate (" rhs ", mat. get_nrow ()); mat. Apply (x, & rhs ); Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32

9 Simple Example: On the Accelerator(GPU,... ) LocalVector < double > x; LocalVector < double > y; LocalMatrix < double > mat ; mat. ReadFileMTX (" my_matrix. mtx "); x. ReadFileASCII (" my_vector. dat "); y. Allocate (" rhs ", mat. get_nrow ()); mat. MoveToAccelerator (); x. MoveToAccelerator (); y. MoveToAccelerator (); mat. Apply (x, & rhs ); // perform rhs <- Ax Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32

10 Info Paralution Output Calling mat.info(); will produce the output: LocalMatrix name=l100.mtx; rows=10000; cols=10000; nnz=49600; prec=64bit; asm=no; format=csr; host backend={cpu(openmp)}; accelerator backend={opencl}; current=opencl If an operation can not be performed on the accelerator efficiently: *** warning: LocalMatrix::ConvertTo() is performed on the host Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32

11 Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32

12 Linear Solver: CG CG < LocalMatrix < double >, LocalVector < double >, double > ls; ls. SetOperator ( mat ); ls. Build (); ls. Solve (rhs, &x); // solve Ax = rhs Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32

13 Linear Solver: PCG CG < LocalMatrix < double >, LocalVector < double >, double > ls; Jacobi < LocalMatrix < double >, LocalVector < double >, double > p; ls. SetOperator ( mat ); ls. SetPreconditioner (p); ls. Build (); ls. Solve (rhs, &x); // solve Ax = rhs Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32

14 Custom Iteration Control CG < LocalMatrix < double >, LocalVector < double >, double > ls; Jacobi < LocalMatrix < double >, LocalVector < double >, double > p; ls. Init (1e -10, // abs_tol 1e -8, // rel_tol 1e+8, // div_tol 10000); // max_iter ls. SetOperator ( mat ); ls. SetPreconditioner (p); ls. Build (); ls. Solve (rhs, &x); // solve Ax = rhs Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32

15 Available Solvers/Preconditioners Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32

16 Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32

17 Switching the Backend No recompilation needed, just switch the library E.g. a second installation in /paralution_cl installed with cmake -DSUPPORT_CUDA=OFF -DSUPPORT_OCL=ON.. just changing export LD_LIBRARY_PATH=$LD_LIBRARY_PATH: /paralution_cl/build/lib will make the executable run using OpenCl. Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32

18 Matlab Plug-in Consists of an example file paralution_pcg.cpp That can be modified easily Compile into a MEX-file Can then be called in MATLAB as a normal function Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32

19 Matlab Plug-in: Details Required some extra attention: Finding mex: export PATH=$PATH:/usr/local/MATLAB/R2013a/bin Using an older compiler: sudo rm /usr/bin/gcc sudo rm /usr/bin/g++ sudo ln -s /usr/bin/gcc-4.4 /usr/bin/gcc sudo ln -s /usr/bin/g /usr/bin/g++ cd /usr/local/matlab/r2013a/sys/os/glnxa64 sudo unlink libstdc++.so.6 sudo ln -s /usr/lib/libstdc++.so.6 Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32

20 Other Plugins FORTRAN OpenFOAM Deal.II Elmer Hermes/Agros2D Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32

21 Pros Easy to use Portable Open Source Many precond/solvers Cons No MPI yet No stencils (No CUDA for CC < 2.0) In development Futureproof...? Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32

22 Introduction Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32

hardware Code independent of platform and hardware backend: Header

23 Idea of ViennaCL Linear algebra and iterative solvers/preconditioners Additional abstract layer between user s preferred program and varying hardware Code independent of platform and hardware backend: Header based Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32

24 Details More linear algebra: Dense matrices, slicing, extraction, etc. Compatible with ublas: just changing the namespace is enough Completely header based, no installation needed CMake only required to build examples highly recommended though Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32

25 Simple Example # include " viennacl / scalar. hpp " //... using namespace viennacl ; //... typedef float ScalarType ; matrix < ScalarType > vcl_a (N, M); vector < ScalarType > vcl_x (M); vector < ScalarType > vcl_rhs (N); std :: vector < ScalarType > stl_x ( M); // standard vectors ensure std :: vector < ScalarType > stl_a (N*M); // linear memory // -> fast_copy //.. fill with data fast_copy (&( stl_a [0]), &( stl_a [0]) + stl_a. size (), vcl_a ); fast_copy (&( stl_x [0]), &( stl_x [0]) + stl_x. size (), vcl_x ); vcl_rhs = linalg :: prod ( vcl_a, vcl_x ); Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32

26 Direct Solvers using namespace viennacl ; matrix < ScalarType > vcl_a vector < ScalarType > vcl_rhs ; //... fill with data // conjugate gradient : linalg :: lu_factorize ( vcl_a ); linalg :: lu_substitute ( vcl_a, vcl_rhs ); Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32

27 Iterative Solvers using namespace viennacl :: linalg ; //... compressed_ matrix < ScalarType > vcl_ matrix ; //... fill with data // conjugate gradient : vcl_ result = solve ( vcl_matrix, vcl_rhs, cg_tag () ); // BiCGStab : vcl_ result = solve ( vcl_matrix, vcl_rhs, bicgstab_ tag () ); // GMRES : vcl_ result = solve ( vcl_matrix, vcl_rhs, gmres_ tag () ); Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32

28 Iteration Control using namespace viennacl :: linalg ; //... compressed_ matrix < ScalarType > vcl_ matrix ; //... fill with data cg_tag custom_ cg (1e -10, 100); // relative tol, max_ iter // conjugate gradient : vcl_ result = solve ( vcl_matrix, vcl_rhs, custom_ cg ); cout << "No. of iters : " << custom_cg. iters () << endl ; cout << " Est. error : " << custom_cg. error () << endl ; // BiCGStab : vcl_ result = solve ( vcl_matrix, vcl_rhs, bicgstab_ tag () ); // GMRES : vcl_ result = solve ( vcl_matrix, vcl_rhs, gmres_ tag () ); Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32

29 Preconditioning using namespace viennacl :: linalg ; //... // Incomplete LU factorization with threshold ilut_ tag ilut_ config ( max_entries, // # nz row elements in L/ U drop_tol, // minimal value of L/ U true ); // level scheduling // subst paralell if possible ilut_precond < SparseMatrix > vcl_ ilut ( vcl_matrix, ilut_ config ); // PCG vcl_ result = solve ( vcl_matrix, vcl_rhs, cg_tag (), vcl_ ilut ); Other Preconditioners: ILU0, Block-ILU, Jacobi, Row Scaling; Experimental: AMG, SPAI Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32

30 pyviennacl import pyviennacl as p import numpy as np from scipy import io from my_ read_ mtx import read_ mtx # from util import read_mtx, read_ vector #B = io. mmread (" L20. mtx ") # not yet supported A = read_mtx ( " L20. mtx ", dtype =np. float64 ) b = p. Vector (20*20,1.0, dtype = np. float64 ) x = p. Vector (20*20,1.0, dtype = np. float64 ) tag = p. gmres_ tag ( tolerance = 1e -5, max_ iterations = 500, krylo # tag = p. cg_tag ( tolerance = 1e -8, max_ iterations = 150) x = p. solve (A, b, tag ) # Show some info print (" Num. iterations : %s" % tag. iters ) print (" Estimated error : %s" % tag. error ) print (" True error : %s" % (A*x-b). norm (2)) Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32

31 Pros Portable! Open Source More linear algebra (ublas, py) Cons No MPI In development Futureproof...? Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32

32 Thank you for your attention! Questions? Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, / 32

PARALUTION - a Library for Iterative Sparse Methods on CPU and GPU

PARALUTION - a Library for Iterative Sparse Methods on CPU and GPU - a Library for Iterative Sparse Methods on CPU and GPU Dimitar Lukarski Division of Scientific Computing Department of Information Technology Uppsala Programming for Multicore Architectures Research Center