Scientific Software Ecosystems

Size: px
Start display at page:

Download "Scientific Software Ecosystems"

Transcription

1 Scientific Software Ecosystems Michael A. Heroux Sandia National Laboratories Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under contract DE-AC04-94AL

2 My Background 1998 now: Staff member at Sandia National Labs Lead these projects: Trilinos: collection of scientific libraries (more later) trilinos.org. xsdk: meta-collection of scientific libraries (more later) xsdk.info. ECP libraries: Oversight of 15 Exacale Computing Project libraries for DOE. Mantevo: Miniapps project for HPC co-design mantevo.org IDEAS Productivity: Scientific Productivity ideas-productivity.org HPCG Benchmark: Complementary benchmark for Top 500 hpcg-benchmark.org Concerned with scalable algorithms for HPC. Concurrent: Scientist in Residence, St. John s University, MN USA : Staff member at Cray Research 88 93: Math libraries developer, sparse solvers, Lapack, BLAS : Application analyst, computational engineering group : Scalable systems applications specialist. Always: Interested in numerical linear algebra for HPC. 2

3 Goals for this presentation Motivate the need for and value of reusable scientific software. Understand the sparse linear algebra ecosystem. Look ahead to next-generation systems. 3

4 Basic Concepts Framework: APIs, working software (defaults). Control inversion, extensibility. Scope: Big, ubiquitous. Toolkit: Plug-and-play libraries, insertable. Scope: Small, local. Lightweight framework: Goal is best of framework/toolkit. Ecosystem: Everything. 4

5 Modern Scientific App Design Goal Classic approach: Develop an application. App has its own framework, no reuse intended. Makes some use of libraries (toolkit components). Desired approach: Compose application within ecosystem. Adapt lightweight framework elements. For example: Use CMake, Doxygen, Unit Testing frameworks. Integrate & tune libraries. Load balancing, solvers, etc. 5

6 Extreme-scale Science Application (MyApp) Domain component interfaces Data mediator interactions. Hierarchical organization. Multiscale/multiphysics coupling. Shared data objects Meshes. Matrices, vectors. Library interfaces Parameter lists. Interface adapters. Function calls. Native code & data objects Single use code. Coordinated component use. Application specific. Documentation content Source markup. Embedded examples. Testing content Unit tests. Test fixtures. Build content Rules. Parameters. Extreme-Scale Scientific Software Ecosystem Extreme- Scale Scientific SW Dev Kit (xsdk) Domain components Reacting flow, etc. Reusable. Libraries Solvers, etc. Interoperable. Frameworks & tools Doc generators. Test, build framework. SW engineering Productivity tools. Models, processes. 6

7 MyApp: Small fraction of total lines of code Orchestrates use of large body of existing software Extreme-scale Science Application (MyApp) Domain component interfaces Data mediator interactions. Hierarchical organization. Multiscale/multiphysics coupling. Shared data objects Meshes. Matrices, vectors. Library interfaces Parameter lists. Interface adapters. Function calls. Native code & data objects Single use code. Coordinated component use. Application specific. Documentation content Source markup. Embedded examples. Testing content Unit tests. Test fixtures. Build content Rules. Parameters. Extreme-Scale Scientific Software Ecosystem Extreme- Scale Scientific SW Dev Kit (xsdk) Domain components Reacting flow, etc. Reusable. Libraries Solvers, etc. Interoperable. Frameworks & tools Doc generators. Test, build framework. SW engineering Productivity tools. Models, processes. 7

8 MyApp: Small fraction of total lines of code Orchestrates use of large body of existing software Extreme-scale Science Application (MyApp) Domain component interfaces Data mediator interactions. Hierarchical organization. Multiscale/multiphysics coupling. Shared data objects Meshes. Matrices, vectors. Library interfaces Parameter lists. Interface adapters. Function calls. Native code & data objects Single use code. Coordinated component use. Application specific. Documentation content Source markup. Embedded examples. Testing content Unit tests. Test fixtures. Build content Rules. Parameters. Extreme-Scale Scientific Software Ecosystem Extreme- Scale Scientific SW Dev Kit (xsdk) Domain components Reacting flow, etc. Reusable. Libraries Solvers, etc. Interoperable. Frameworks & tools Doc generators. Test, build framework. SW engineering Productivity tools. Models, processes. 8 xsdk: Large collection of modular, parametrizable, reusable software components, tools, policies. State-of-the-art, always improving.

9 Some Popular Ecosystems (Frameworks) Cactus: FEniCS: Charm++: PETSc: 9

10 Matlab Matrix Laboratory: Industrial quality technical computing platform. Many toolboxes for important problem domains. A very rational productivity option on a single compute node. Some distributed parallel support, but more complicated. Solve Ax = b in Matlab? x = A\b; Backslash symbol represents complex decision tree. Considerations: Size, sparsity, condition number, Tim Davis: Backslash guy. If Matlab works for your problem sizes, use it. Makes great prototyping environment in other cases. 10

11 Problem Solving Environments Many Productivity Enhancing environments: Numpy, Julia, others. Python wrappers: SWIG-based and others. Wrap high performance libraries underneath. Example: PyTrilinos. Can be the right tool (and compete with Matlab): Especially for exploration, but even for production settings. Not generally used on supercomputers. Although always discussed. 11

12 Why Use Math Libraries 12

13 A farmer had chickens and pigs. There was a total of 60 heads and 200 feet. How many chickens and how many pigs did the farmer have? Let x be the number of chickens, y be the number of pigs. Then: x + y = 60 2x + 4y = 200 From first equation x = 60 y, so replace x in second equation: 2(60 y) + 4y = 200 Solve for y: 120 2y + 4y = 200 2y = 80 y = 40 Solve for x: x = = 20. The farmer has 20 chickens and 40 pigs. 13

14 A restaurant owner purchased one box of frozen chicken and another box of frozen pork for $60. Later the owner purchased 2 boxes of chicken and 4 boxes of pork for $200. What is the cost of a box of frozen chicken and a box of frozen pork? Let x be the price of a box of chicken, y the price of a box of pork. Then: x + y = 60 2x + 4y = 200 From first equation x = 60 y, so replace x in second equation: 2(60 y) + 4y = 200 Solve for y: 120 2y + 4y = 200 2y = 80 y = 40 Solve for x: x = = 20. A box of chicken costs $20 and a box of pork costs $40. 14

15 Problem Statement A restaurant owner purchased one box of frozen chicken and another box of frozen pork for $60. Later the owner purchased 2 boxes of chicken and 4 boxes of pork for $200. What is the cost of a box of frozen chicken and a box of frozen pork? Let x be the price of a box of chicken, y the price of a box of pork. Then: x + y = 60 2x + 4y = 200 From first equation x = 60 y, so replace x in second equation: 2(60 y) + 4y = 200 Solve for y: 120 2y + 4y = 200 2y = 80 y = 40 Solve for x: x = = 20. A box of chicken costs $20. A box of pork costs $40. Variables Problem Setup Solution Method Translate Back 15

16 Why Math Libraries? Many types of problems. Similar Mathematics. App Separation of concerns: w Problem Statement. w Translation to Math. w Set up problem. w Solve Problem. w Translate Back. SuperLU 16

17 Importance of Math Libraries Computer solution of math problems is hard: Floating point arithmetic not exact: 1 + ε = 1, for small ε > 0. (a + b) + c not always equal to a + (b + c). High fidelity leads to large problems: 1M to 10B equations. Clusters require coordinated solution across 100 1M processors. Sophisticated solution algorithms and libraries leveraged: Solver expertise highly specialized, expensive. Write code once, use in many settings. Maintenance cost of successful software? 70 80% of total roll-your-own is not just about writing the code. Use of libraries spread that cost. 17

18 Your Turn: List reasons to use or not use libraries 18

19 Your Turn: List reasons to use or not use libraries Leverage investment across more than one app. Separate concerns of using functionality from providing it. Cost of owning software is development + maintenance. Risk if library team stops development. Mitigate by having your own API, adapters for multiple libraries. Lack of availability on all systems (including emerging). Mitigate by having your own (inferior) portable capability. Complexity increase from library dependency. Mitigate by using libraries with similar complexity. 19

20 Sparse Direct Methods Construct L and U, lower and upper triangular, resp, s.t. Solve Ax = b: LU = A 1. Ly = b 2. Ux = y Symmetric versions: LL T = A, LDL T When are direct methods effective? 1D: Always, even on many, many processors. 2D: Almost always, except on many, many processors. 2.5D: Most of the time. 3D: Only for small/medium problems on small/medium processor counts. Bottom line: Direct sparse solvers should always be in your toolbox. 20

21 Sparse Direct Solver Packages HSL: MUMPS: Pardiso: PaStiX: SuiteSparse: SuperLU: UMFPACK : WSMP: Trilinos/Amesos/Amesos2: Notes: All have threaded parallelism. All but SuiteSparse and UMFPACK have distributed memory (MPI) parallelism. MUMPS, PaStiX, SuiteSparse, SuperLU, Trilinos, UMFPACK are freely available. HSL, Pardiso, WSMP are available freely, with restrictions. Some research efforts on GPUs, unaware of any products. Emerging hybrid packages: STRUMPACK Sherry Li, Pieter Ghysels. HIPS Gaidamour, Henon. Trilinos/ShyLU Rajamanickam,. 21

22 Other Sparse Direct Solver Packages Legagy packages that are open source but not under active development today. TAUCS : PSPASES : BCSLib : Eigen Newer, active, but sequential only (for sparse solvers). Sparse Cholesky (including LDL^T), Sparse LU, Sparse QR. Wrappers to quite a few third-party sparse direct solvers. 22

23 Emerging Trend in Sparse Direct New work in low-rank approximations to off-diagonal blocks. Typically: Off-diagonal blocks in the factorization stored as dense matrices. New: These blocks have low rank (up to the accuracy needed for solution). Can be represented by approximate SVD. Still uncertain how broad the impact will be. Will rank-k SVD continue to have low rank for hard problems? Potential: Could be breakthrough for extending sparse direct method to much larger 3D problems. 23

24 24 Iterative Methods Given an initial guess for x, called x 0, (x 0 = 0 is acceptable) compute a sequence x k, k = 1,2, such that each x k is closer to x. Definition of close : Suppose x k = x exactly for some value of k. Then r k = b - Ax k = 0 (the vector of all zeros). And norm(r i ) = sqrt(ddot(r k, r k )) = 0 (a number). For any x k, let r k = b - Ax k If norm(r k ) = sqrt(ddot(r k, r k )) is small (< 1.0E-6 say) then we say that x k is close to x. The vector r is called the residual vector.

25 Sparse Iterative Solver Packages PETSc: hypre: Trilinos: Paralution: (Manycore; GPL/Commercial license) HSL: (Academic/Commercial License) Eigen (Sequential CG, BiCGSTAB, ILUT/Sparskit) Sparskit: Notes: There are many other efforts, but I am unaware of any that have a broad user base like hypre, PETSc and Trilinos. Sparskit, and other software by Yousef Saad, is not a product with a large official user base, but these codes appear as embedded (serial) source code in many applications. PETSc and Trilinos support threading, distributed memory (MPI) and growing functionality for accelerators. Many of the direct solver packages support some kind of iteration, if only iterative refinement. 25

26 Which Type of Solver to Use? Dimension Type Notes 1D Direct Often tridiagonal (Thomas alg, periodic version). 2D very easy Iterative If you have a good initial guess, e.g., transient simulation. 2D otherwise Direct Almost always better than iterative. 2.5D Direct Example: shell problems. Good ordering can keep fill low. 3D smooth Direct? Emerging methods for low-rank SVD representation. 3D easy Iterative Simple preconditioners: diagonal scaling. CG or BiCGSTAB. 3D harder Iterative Prec: IC, ILU (with domain decomposition if in parallel). 3D hard Iterative Use GMRES (without restart if possible). 3D + large Iterative Add multigrid, geometric or algebraic. 26

27 Details about Sparse Matrices 27

28 General Sparse Matrix Example: a a 16 0 a 22 a A = 0 a 32 a 33 0 a a a 53 0 a 55 a 56 a a 65 a 66 28

29 Compressed Row Storage (CRS) Format AKA, CSR Format Idea: Create 1 length nnz array of non-zero value. 1 length nnz array of column indices. 1 length m+1 array of ints: double int int * values = new double [nnz]; * colindices = new int[nnz]; * rowpointers = new int[m+1]; nnz Number of nonzero terms in the matrix. m Matrix dimension. 29

30 Compressed Row Storage (CRS) Format Fill arrays as follows: rowpointer[0] = 0; double * curvalueptr = values; int * curindicesptr = colindices; for (i=0; i<m; ++i) { // for each row rowpointer[i+1] = rowpointer[i] + number of nonzero entries in row i. for (j=0; j<numrowentries; j++) { // for each entry in row i *curvalueptr++ = value of j th nonzero entry in row i. *curindicesptr ++ = column index of j th nonzero entry in row i. } } 30

31 CRS Example! # A = # # # " $ & & & & % values = {4, 1, 3, 2, 6, 5, 9, 8}, colindices = {0, 3, 1, 3, 2, 0, 2, 3} rowpointer = {0, 2, 4, 5, 8} 31

32 Your turn: CRS Example! # A = # # # " $ & & & & % values = colindices = rowpointer = 32

33 Your turn: CRS Example! # A = # # # " $ & & & & % values = colindices = rowpointer = values = {4, 2, 3, 2, 5, 7, 9, 8}, colindices = {0, 1, 1, 3, 0, 1, 2, 3} rowpointer = {0, 2, 4, 4, 8} 33

34 Serial Sparse MV int sparsemv( int m, double * values, int * colindices, int * rowpointers, double * x, double * y){ for (int i=0; i< m; ++i){ double sum = 0.0; curnumentries = rowpointers[i+1] rowpointers[i]; double * curvals = values[rowpointers[i]]; int * curinds = colindices[rowpointers[i]]; } for (int j=0; j< curnumentries; j++) sum += curvals[j]*x[curinds[j]]; y[i] = sum; } return(0); 34

35 Many, many Sparse Formats ELLPACK: Determine max number nonzeros per row. Make nonzero count constant for all rows Pad with zeros for rows with fewer. Eliminates rowpointer array. Works well if nonzero count is nearly uniform. Hybrid CRS/ELLPACK Many more variations. CCS Column version Has tranpose relationship with CRS More common for sparse direct solvers. 35

36 Trilinos Overview 36

37 What is Trilinos? Object-oriented software framework for Solving big complex science & engineering problems. Large collection of reusable scientific capabilities. More like LEGO bricks than Matlab. 37

38 Background/Motivation 38

39 Optimal Kernels to Optimal Solutions: w Geometry, Meshing w Discretizations, Load Balancing. w Scalable Linear, Nonlinear, Eigen, Transient, Optimization, UQ solvers. w Scalable I/O, GPU, Manycore w 60+ Packages. w Other distributions: w Cray LIBSCI w Public repo. w Thousands of Users. w Worldwide distribution. Laptops to Leadership systems

40 Trilinos Strategic Goals Scalable Computations: As problem size and processor counts increase, the cost of the computation will remain nearly fixed. Hardened Computations: Never fail unless problem essentially intractable, in which case we diagnose and inform the user why the problem fails and provide a reliable measure of error. Full Vertical Coverage: Provide leading edge enabling technologies through the entire technical application software stack: from problem construction, solution, analysis and optimization. Universal Interoperability: All Trilinos packages, and important external packages, will be interoperable, so that any combination of packages and external software (e.g., PETSc, Hypre) that makes sense algorithmically will be possible within Trilinos. Universal Accessibility: All Trilinos capabilities will be available to users of major computing environments: C++, Fortran, Python and the Web, and from the desktop to the latest scalable systems. Universal Solver RAS: Trilinos will be: Reliable: Leading edge hardened, scalable solutions for each of these applications Available: Integrated into every major application at Sandia Serviceable: Self-sustaining. Algorithmic Goals Software Goals

41 Product Leaders: Layer of Proactive Leadership Product: Framework (J. Willenbring). Data Services (K. Devine). Linear Solvers (S. Rajamanickam). Nonlinear Solvers (R. Pawlowski). Discretizations (M. Perego). Product focus: New, stronger leadership model. Focus: Published APIs High cohesion within product. Low coupling across. Deliberate product-level upstream planning & design. 41

42 Unique features of Trilinos Huge library of algorithms Linear and nonlinear solvers, preconditioners, Optimization, transients, sensitivities, uncertainty, Growing support for multicore & hybrid CPU/GPU Built into the new Tpetra linear algebra objects Therefore into iterative solvers with zero effort! Unified intranode programming model: Kokkos Spreading into the whole stack: Multigrid, sparse factorizations, element assembly Support for mixed and arbitrary precisions Don t have to rebuild Trilinos to use it Support for flexible 2D sparse partitioning Useful for graph analytics, other data science apps. Support for huge (> 2B unknowns) problems 42

43 Trilinos Access Trilinos current. 58 packages. Website: GitHub (preferred): 43

44 Trilinos software organization 44

45 Objective Trilinos Package Summary Package(s) Discretizations Methods Services Solvers Meshing & Discretizations Time Integration Automatic Differentiation Mortar Methods Linear algebra objects Interfaces Load Balancing Skins C++ utilities, I/O, thread API Iterative linear solvers Direct sparse linear solvers Direct dense linear solvers Iterative eigenvalue solvers ILU-type preconditioners Multilevel preconditioners Block preconditioners Nonlinear system solvers Optimization (SAND) Stochastic PDEs STK, Intrepid, Pamgen, Sundance, ITAPS, Mesquite Rythmos Sacado Moertel Epetra, Tpetra, Kokkos, Xpetra Thyra, Stratimikos, RTOp, FEI, Shards Zoltan, Isorropia, Zoltan2 PyTrilinos, WebTrilinos, ForTrilinos, Ctrilinos, Optika Teuchos, EpetraExt, Kokkos, Triutils, ThreadPool, Phalanx, Trios AztecOO, Belos, Komplex Amesos, Amesos2, ShyLU Epetra, Teuchos, Pliris Anasazi, Rbgen AztecOO, IFPACK, Ifpack2, ShyLU ML, CLAPS, Muelu Meros, Teko NOX, LOCA, Piro MOOCHO, Aristos, TriKota, Globipack, Optipack Stokhos

46 Interoperability vs. Dependence ( Can Use ) ( Depends On ) Although most Trilinos packages have no explicit dependence, often packages must interact with some other packages: w NOX needs operator, vector and linear solver objects. w AztecOO needs preconditioner, matrix, operator and vector objects. w Interoperability is enabled at configure time. w Trilinos cmake system is vehicle for: Establishing interoperability of Trilinos components Without compromising individual package autonomy. Trilinos_ENABLE_ALL_OPTIONAL_PACKAGES option Architecture supports simultaneous development on many fronts. 46

47 Trilinos is made of packages Not a monolithic piece of software w Like LEGO bricks, not Matlab Each package: w Has its own development team and management w Makes its own decisions about algorithms, coding style, etc. w May or may not depend on other Trilinos packages Trilinos is not indivisible w You don t need all of Trilinos to get things done w Any subset of packages can be combined and distributed w Current public release contains ~50 of the 55+ Trilinos packages Trilinos top layer framework w Not a large amount of source code: ~1.5% w Manages package dependencies Like a GNU/Linux package manager w Runs packages tests nightly, and on every check-in Package model supports multifrontal development New effort to create apps by gluing Trilinos together: Albany 47

48 48 Software Development and Delivery

49 Are C++ templates safe? No, but they are good. Compile-time Polymorphism Templates and Sanity upon a shifting foundation Software delivery: Essential Activity How can we: Implement mixed precision algorithms? Implement generic fine-grain parallelism? Support hybrid CPU/GPU computations? Support extended precision? Explore redundant computations? Prepare for both exascale swim lanes? C++ templates only sane way, for now. Template Benefits: Compile time polymorphism. True generic programming. No runtime performance hit. Strong typing for mixed precision. Support for extended precision. Many more Template Drawbacks: Huge compile-time performance hit: But good use of multicore :) Eliminated for common data types. - Complex notation: - Esp. for Fortran & C programmers). - Can insulate to some extent. 49

50 Solver Software Stack Phase I packages: SPMD, int/double Phase II packages: Templated Optimization Unconstrained: Constrained: Bifurcation Analysis Transient Problems DAEs/ODEs: Nonlinear Problems Linear Problems Linear Equations: Eigen Problems: Distributed Linear Algebra Matrix/Graph Equations: Vector Problems: 50 Sensitivities (Automatic Differentiation: Sacado) ROL LOCA Tempus NOX Anasazi Ifpack, ML, etc... AztecOO Epetra Teuchos

51 Phase I packages Solver Software Stack Phase II packages Phase III packages: Manycore*, templated Optimization Unconstrained: Constrained: Bifurcation Analysis Transient Problems DAEs/ODEs: Nonlinear Problems Linear Problems Linear Equations: Eigen Problems: istributed Linear Algebra Matrix/Graph Equations: Vector Problems: 51 Sensitivities (Automatic Differentiation: Sacado) ROL LOCA T-LOCA Tempus NOX T-NOX Anasazi AztecOO Ifpack, ML, etc... Epetra Teuchos Belos* Ifpack2*, Muelu*,etc. Tpetra* Kokkos*

52 52 Using Trilinos Linear Solvers

53 Objective Trilinos Package Summary Package(s) Discretizations Methods Services Solvers Meshing & Discretizations Time Integration Automatic Differentiation Mortar Methods Linear algebra objects Interfaces Load Balancing Skins Utilities, I/O, thread API Iterative linear solvers Direct sparse linear solvers Incomplete factorizations Multilevel preconditioners Direct dense linear solvers Iterative eigenvalue solvers Block preconditioners Nonlinear solvers Optimization Stochastic PDEs STKMesh, Intrepid, Pamgen, Sundance, Mesquite Rythmos Sacado Moertel Epetra, Tpetra Xpetra, Thyra, Stratimikos, RTOp, FEI, Shards Zoltan, Isorropia, Zoltan2 PyTrilinos, WebTrilinos, ForTrilinos, Ctrilinos, Optika Teuchos, EpetraExt, Kokkos, Triutils, ThreadPool, Phalanx AztecOO, Belos, Komplex Amesos, Amesos2, ShyLU AztecOO, IFPACK, Ifpack2 ML, CLAPS, MueLu Epetra, Teuchos, Pliris Anasazi Meros, Teko NOX, LOCA MOOCHO, Aristos, TriKota, Globipack, Optipack Stokhos

54 AztecOO Iterative linear solvers: CG, GMRES, BiCGSTAB, Incomplete factorization preconditioners Aztec was Sandia s workhorse solver: Extracted from the MPSalsa reacting flow code Installed in dozens of Sandia apps external licenses AztecOO improves on Aztec by: Using Epetra objects for defining matrix and vectors Providing more preconditioners & scalings Using C++ class design to enable more sophisticated use AztecOO interface allows: Continued use of Aztec for functionality Introduction of new solver capabilities outside of Aztec 54

55 Belos Next-generation linear iterative solvers Decouples algorithms from linear algebra objects Linear algebra library has full control over data layout and kernels Improvement over AztecOO, which controlled vector & matrix layout Essential for hybrid (MPI+X) parallelism Solves problems that apps really want to solve, faster: Multiple right-hand sides: AX=B Sequences of related systems: (A + ΔA k ) X k = B + ΔB k Many advanced methods for these types of systems Block & pseudoblock solvers: GMRES & CG Recycling solvers: GCRODR (GMRES) & CG Seed solvers (hybrid GMRES) Block orthogonalizations (TSQR) Supports arbitrary & mixed precision, complex, If you have a choice, pick Belos over AztecOO 55

56 Ifpack(2): Algebraic preconditioners Preconditioners: Overlapping domain decomposition Incomplete factorizations (within an MPI process) (Block) relaxations & Chebyshev Accepts user matrix via abstract matrix interface Use {E,T}petra for basic matrix / vector calculations Perturbation stabilizations & condition estimation Can be used by all other Trilinos solver packages Ifpack2: Tpetra version of Ifpack Supports arbitrary precision & complex arithmetic Path forward to hybrid-parallel factorizations 56

57 Amesos2 Direct Solver interface for the Tpetra Stack. Typical Usage: preorder(), symbolicfactorization(), numericfactorization(), solve(). Easy to support new solvers (Current support for all the SuperLU variants). Easy to support new multivectors and sparse matrices. Can support third party solver specific parameters with little changes. Available in the current release of Trilinos.

58 : Multi-level Preconditioners Smoothed aggregation, multigrid, & domain decomposition Critical technology for scalable performance of many apps ML compatible with other Trilinos packages: Accepts Epetra sparse matrices & dense vectors ML preconditioners can be used by AztecOO, Belos, & Anasazi Can also be used independent of other Trilinos packages Next-generation version of ML: MueLu Works with Epetra or Tpetra objects (via Xpetra interface) 58

59 MueLu: Next-gen algebraic multigrid Motivation for replacing ML Improve maintainability & ease development of new algorithms Decouple computational kernels from algorithms ML mostly monolithic (& 50K lines of code) MueLu relies more on other Trilinos packages Exploit Tpetra features MPI+X (Kokkos programming model mitigates risk) 64-bit global indices (to solve problems with >2B unknowns) Arbitrary Scalar types (Tramonto runs MueLu w/ double-double) Works with Epetra or Tpetra (via Xpetra common interface) Facilitate algorithm development Energy minimization methods Geometric or classic algebraic multigrid; mix methods together Better support for preconditioner reuse Explore options between blow it away & reuse without change 59

60 ShyLUand Subdomain Solvers : Overview Amesos2 Ifpack2 ShyLU KLU2 Basker Tacho FAST-ILU KokkosKernels SGS, Tri-Solve (HTS) MPI+X based subdomain solvers Decouple the notion of one MPI rank as one subdomain: Subdomains can span multiple MPI ranks each with its own subdomain solver using X or MPI+X Subpackages of ShyLU: Multiple Kokkos-based options for on-node parallelism Basker : LU or ILU (t) factorization Tacho: Incomplete Cholesky - IC (k) Fast-ILU: Fast-ILU factorization for GPUs KokkosKernels: Coloring based Gauss-Seidel (M. Deveci), Triangular Solves (A. Bradley) Lots more work (w/ Christian Trott) Under active development. Jointly funded by ASC, ATDM, FASTMath, LDRD.

61 Abstract solver interfaces & applications 61

62 62 Stratimikos package Uniform run-time interface to many different packages Linear solvers: Amesos, AztecOO, Belos, Preconditioners: Ifpack, ML, Defines common interface to create and use linear solvers Reads in options through a Teuchos::ParameterList Can change solver and its options at run time Can validate options, & read them from a string or XML file Accepts any linear system objects that provide E/Tpetra_Operator / E/Tpetra_RowMatrix view of the matrix Vector views (e.g., E/Tpetra_MultiVector) for right-hand side and initial guess

63 Stratimikos Parameter List and Sublists <ParameterList name= Stratimikos > <Parameter name="linear Solver Type" type="string" value= AztecOO"/> <Parameter name="preconditioner Type" type="string" value="ifpack"/> <ParameterList name="linear Solver Types"> <ParameterList name="amesos"> <Parameter name="solver Type" type="string" value="klu"/> <ParameterList name="amesos Settings"> <Parameter name="matrixproperty" type="string" value="general"/>... <ParameterList name="mumps">... </ParameterList> <ParameterList name="superludist">... </ParameterList> </ParameterList> </ParameterList> <ParameterList name="aztecoo"> <ParameterList name="forward Solve"> <Parameter name="max Iterations" type="int" value="400"/> <Parameter name="tolerance" type="double" value="1e-06"/> <ParameterList name="aztecoo Settings"> <Parameter name="aztec Solver" type="string" value="gmres"/>... </ParameterList> </ParameterList>... </ParameterList> <ParameterList name="belos">... (Details omitted)... </ParameterList> </ParameterList> <ParameterList name="preconditioner Types"> <ParameterList name="ifpack"> <Parameter name="prec Type" type="string" value="ilu"/> <Parameter name="overlap" type="int" value="0"/> <ParameterList name="ifpack Settings"> <Parameter name="fact: level-of-fill" type="int" value="0"/>... </ParameterList> </ParameterList> <ParameterList name="ml">... (Details omitted)... </ParameterList> </ParameterList> </ParameterList> Top level parameters Linear Solvers Preconditioners Sublists passed on to package code. Every parameter and sublist is handled by Thyra code and is fully validated.

64 Stratimikos Parameter List and Sublists <ParameterList name= Stratimikos > <Parameter name="linear Solver Type" type="string" value= Belos"/> <Parameter name="preconditioner Type" type="string" value= ML"/> <ParameterList name="linear Solver Types"> <ParameterList name="amesos"> <Parameter name="solver Type" type="string" value="klu"/> <ParameterList name="amesos Settings"> <Parameter name="matrixproperty" type="string" value="general"/>... <ParameterList name="mumps">... </ParameterList> <ParameterList name="superludist">... </ParameterList> </ParameterList> </ParameterList> <ParameterList name="aztecoo"> <ParameterList name="forward Solve"> <Parameter name="max Iterations" type="int" value="400"/> <Parameter name="tolerance" type="double" value="1e-06"/> <ParameterList name="aztecoo Settings"> <Parameter name="aztec Solver" type="string" value="gmres"/>... </ParameterList> </ParameterList>... </ParameterList> <ParameterList name="belos">... (Details omitted)... </ParameterList> </ParameterList> <ParameterList name="preconditioner Types"> <ParameterList name="ifpack"> <Parameter name="prec Type" type="string" value="ilu"/> <Parameter name="overlap" type="int" value="0"/> <ParameterList name="ifpack Settings"> <Parameter name="fact: level-of-fill" type="int" value="0"/>... </ParameterList> </ParameterList> <ParameterList name="ml">... (Details omitted)... </ParameterList> </ParameterList> </ParameterList> Top level parameters Linear Solvers Preconditioners Solver/preconditio ner changed by single argument. Parameter list is standard XML. Can be read from command line, file, string or handcoded.

65 Stratimikos Details Stratimikos has just one primary class: Stratimikos::DefaultLinearSolverBuilder An instance of this class accepts a parameter list that defines: Linear Solver: Amesos, AztecOO, Belos. Preconditioner: Ifpack, ML, AztecOO. Albany, other apps: Access solvers through Stratimikos. Parameter list is standard XML. Can be: Read from command line. Read from a file. Passed in as a string. Defined interactively. 65 Hand coded in source code.

66 A Glimpse at Current and Future Work

67 Node-level concurrency

68 Trilinos linear solvers Sparse linear algebra (Kokkos/KokkosKernels/Tpetra) Threaded construction, Sparse graphs, (block) sparse matrices, dense vectors, parallel solve kernels, parallel communication & redistribution Iterative (Krylov) solvers (Belos) CG, GMRES, TFQMR, recycling methods Sparse direct solvers (Amesos2) Algebraic iterative methods (Ifpack2) Jacobi, SOR, polynomial, incomplete factorizations, additive Schwarz Shared-memory factorizations (ShyLU) LU, ILU(k), ILUt, IC(k), iterative ILU(k) Direct+iterative preconditioners Segregated block solvers (Teko) Algebraic multigrid (MueLu) Kokkos Kernels 68

69 Must support > 3 architectures Coming systems to support Trinity (Intel Haswell & KNL) Sierra: NVIDIA GPUs + IBM multicore CPUs Plus everything else 3 different architectures Multicore CPUs (big cores) Manycore CPUs (small cores) GPUs (highly parallel) MPI only, & MPI + threads Threads don t always pay on non-gpu architectures today Porting to threads must not slow down the MPI-only case 69

70 Kokkos: Performance, Portability, & Productivity LAMMPS# Trilinos# Sierra# Albany# Kokkos# HBM# HBM# HBM# HBM# DDR# DDR# DDR# DDR# DDR# 70

71 Kokkos: Common C++ - based programming model for thread parallelism on GPUs, CPUs, Parallel {for, reduce, scan} w/ custom user code Exposes different levels of parallelism Flat [0,N), or hierarchical (team, thread, vector) Experimental task parallelism too! Different memory & execution spaces Control where data live & code executes Enable hybrid (host + GPU) parallelism Multidimensional arrays (Kokkos::View) w/ slices Decouple array layout (row/column-major, tiled, ) from app Default layout optimized for the architecture (SoA / AoS) Unified interface to shared memory, texture fetch, atomic access, Goal: write code once, run well on many different back-ends 71

72 Kokkos protects us against Hardware divergence Programming model diversity Threads at all Kokkos::Serial back-end Kokkos semantics require vectorizable (ivdep) loops Expose parallelism to exploit later Hierarchical parallelism model encourages exploiting locality Kokkos protects our HUGE time investment of porting Trilinos Kokkos is our hedge 72

73 Foundational Technology: KokkosKernels Provide BLAS (1,2,3); Sparse; Graph and Tensor Kernels Kokkosbased: Performance Portable Interfaces to vendor libraries if applicable (MKL, CuSparse, ) Goal: Provide kernels for all levels of node hierarchy Socket Thread Teams, Shared L3, e.g. Full Solve Core Thread Parallelism, Shared L1/L2, e.g. Subdomain Solve Hyper Thread Vector Parallelism, Synch free, e.g. Matrix Row x Vector Vector Lane Elemental Functions, Serial, e.g. 3x3 DGEMM 73

74 Performance Portability Example: SPMV Single Implementation Parameterized Hierarchical Parallelism Beats vendor libraries for relevant matrix sizes Normalized Time All > 7MB Kokkos/MKL Kokkos/CuSparse 74

75 NALU: a CFD Application as Protoype NaluProduction (Master) Patterns, and data structures shared with IC production apps Uses Trilinos Solver stack Scaling demonstrated to >500k cores 2D/3D Sliding/Overset Mesh Migrating Nalu with Kokkos MultiPhysics CHT Migrating Nalu with Kokkos Prototype Kokkos assembly Used to harden threaded solver stack Get Nalu ready for Trinity Phase II demo 75

76 Making Matrix Assembly Thread Scalable Simple Heat Conduction Problem, 2M elements, strong-scaling Portability Gap ~2.5x ~1.5x 76

77 Containers

78 78 Trilinos usage via Docker WebTrilinos Tutorial docker pull johntfoster/trilinos docker pull johntfoster/peridigm docker run --name peridigm0 -d -v `pwd`:/output johntfoster/peridigm \ Peridigm fragmenting_cylinder.peridigm Etc Containerization technology Think: Virtual machine But light-weight, native. Composable, layered. Driven by the data science community.

79 First Docker MPI Results (Sean Deal) SJU Cluster, Epetra Basic Perf Test MatVec Lower Solve Norm2, Dot, Update Harmonic mean of 5 tests 4M Eq per proc Native Docker 48 MPI ranks 79

80 Future Docker/Shifter Efforts Standard Developer Environment: Enables reproducibility (error states). Contains all third-party libraries: Trilinos can use dozens of TPLs (SuperLU, MUMPS, ParMetis, etc., MKL? Licensing?) Contains one or a few compiler versions: Enables uniform error detection/correction. No ambiguity about built/test environment. Productivity improvement: Libraries pre-built. (Full build can take hours). Specialized containers: Shifter Docker for Supercomputing. Easy access to GPUs. 80

81 The Extreme-Scale Scientific Software Development Kit (xsdk)

82 Extreme-scale Science Applications Domain component interfaces Data mediator interactions. Hierarchical organization. Multiscale/multiphysics coupling. Shared data objects Meshes. Matrices, vectors. Library interfaces Parameter lists. Interface adapters. Function calls. Native code & data objects Single use code. Coordinated component use. Application specific. Documentation content Source markup. Embedded examples. Testing content Unit tests. Test fixtures. Build content Rules. Parameters. Extreme-scale Scientific Software Ecosystem Focus of key accomplishments: xsdk foundations Domain components Reacting flow, etc. Reusable. Libraries Solvers, etc. Interoperable. Frameworks & tools Doc generators. Test, build framework. SW engineering Productivity tools. Models, processes. Extreme-scale Scientific Software Development Kit (xsdk)

83 Building the foundation of a highly effective extreme-scale scientific software ecosystem 83 Focus: Increasing the functionality, quality, and interoperability of important scientific libraries, domain components, and development tools xsdk release 0.2.0: April 2017 (soon) website: xsdk.info Spackpackage installation n./spack install xsdk Package interoperability n Numerical libraries n hypre, PETSc, SuperLU, Trilinos n Domain components n Alquimia, PFLOTRAN xsdk community policies: Address challenges in interoperability and sustainability of software developed by diverse groups at different institutions Impact: Improved code quality, usability, access, sustainability Inform potential users that an xsdk member package can be easily used with other xsdk packages Foundation for work on performance portability,deeper levels of package interoperability

84 84 xsdk community installation policies: GNU Autoconf and Cmake options Motivation: Obtaining, configuring, and installing multiple independent software packages is tedious and error prone. Need consistency of compiler (+version, options), 3rd-party packages, etc. Approach: Define xsdk community installation policies to which all xsdk packages will subscribe and be tested Do not require all packages to use the same installation software, merely that they follow the same interface; maximum flexibility for each package to choose its own toolchain Policies: Standard subset of configure and CMake options for xsdk and other HPC packages in order to make the configuration and installation as efficient as possible on common platforms Including standard Linux distributions and Mac OS X, as well as on target machines at DOE computing facilities (ALCF, NERSC, and OLCF) Topics include: select compilers and compiler flags, create packages with debugging information, build shared libraries, build interface for a particular language, determine precision, determine index size, set location for BLAS and LAPACK, determine other packages and include directories

85 xsdk community policies Draft 0.3, Dec xsdk compatible package: Must satisfy mandatory xsdk policies: M1. Support xsdk community GNU Autoconf or CMake options. M2. Provide a comprehensive test suite. M3. Employ user-provided MPI communicator. M4. Give best effort at portability to key architectures. M5. Provide a documented, reliable way to contact the development team. M6. Respect system resources and settings made by other previously called packages. M7. Come with an open source license. M8. Provide a runtime API to return the current version number of the software. M9. Use a limited and well-defined symbol, macro, library, and include file name space. M10. Provide an accessible repository (not necessarily publicly available). M11. Have no hardwired print or IO statements. M12. Allow installing, building, and linking against an outside copy of external software. M13. Install headers and libraries under <prefix>/include/ and <prefix>/lib/. M14. Be buildable using 64 bit pointers. 32 bit is optional. Also specify recommended policies, which currently are encouraged but not required: R1. Have a public repository. R2. Possible to run test suite under valgrind in order to test for memory corruption issues. R3. Adopt and document consistent system for error conditions/exceptions. R4. Free all system resources it has acquired as soon as they are no longer needed. R5. Provide a mechanism to export ordered list of library dependencies. xsdk member package: Must be an xsdk-compatible package, and it uses or can be used by another package in the xsdk, and the connecting interface is regularly tested for regressions.

86 xsdk package interoperability 86 Release 0.2.0, April 2017 (soon) xsdk numerical libraries hypre: High-performance preconditioners for the solution of large, sparse linear systems, featuring a semi-structured interface. PETSc: Suite of data structures and routines for the scalable solution of PDE-based applications. Includes linear solvers, preconditioners, nonlinear solvers, ODE integrators, and optimization solvers (TAO). SuperLU: Direct solvers for large, sparse, nonsymmetric linear systems on distributed memory systems with hybrid node architectures. Trilinos: Collection of approx 60 packages of reusable scientific software. Capabilities include geometry, meshing, discretization, partitioning and load balancing; parallel construction of distributed matrices, graphs and vectors; parallel, scalable solution of large linear, nonlinear and transient systems of equations; and solution of embedded optimization and UQ problems. xsdk domain components Alquimia: Biogeochemistry interface and wrapper. PFLOTRAN: Reactive flow and transport modeling for surface and subsurface processes. Package installation Spack: package management tool designed to support multiple versions and configurations of software on a wide variety of platforms and environments. Levels of package interoperability Interoperability level 1 both packages can be used (side by side) in an application Interoperability level 2 the libraries can exchange date (or control data) with each other Interoperability level 3 each library can call the other library to perform unique computations Great multi-institutional teamwork! Release lead: Jim Willenbring

87 87 xsdk release 0.2.0: Packages can be readily used in combination by multiphysics, multiscale apps Notation: A B: A can use B to provide functionality on behalf of A Application A Multiphysics Application C Application B xsdk functionality, February 2017 Tested on key machines at ALCF, NERSC, OLCF, also Linux, Mac OS X HDF5 Alquimia PETSc hypre BLAS More external software More contributed domain components PFLOTRAN SuperLU Trilinos More contributed libraries Domain components Reacting flow, etc. Reusable. Libraries Solvers, etc. Interoperable. Frameworks & tools Doc generators. Test, build framework. SW engineering Productivity tools. Models, processes. Extreme-Scale Scientific Software Development Kit (xsdk)

88 More xsdk info 88 Paper: xsdk Foundations: Toward an Extreme-scale Scientific Software Development Kit R. Bartlett, I. Demeshko, T. Gamblin, G. Hammond, M. Heroux, J. Johnson, A. Klinvex, X. Li, L.C. McInnes, D. Osei-Kuffuor, J. Sarich, B. Smith, J. Willenbring, U.M. Yang To appear in Supercomputing Frontiers and Innovations, 2017 CSE17 Posters: xsdk: Working toward a Community Software Ecosystem n Managing the Software Ecosystem with Spack n

89 xsdk: Next steps 89 xsdk4ecp: Enhancements needed for exascale applications Coordinated use of on-node resources Integrated execusion Control inversion, adaptive execution strategies Coordinated and sustainable documentation, testing, packaging, and deployment Packages working toward xsdk compatibility Chombo: Software for adaptive solution of PDEs Compatible with all xsdk community policies ALExa: Accelerated Libraries for Exascale (AMP, DTK, TASMANIAN) Dense linear algebra packages: MAGMA, PLASMA, DPLASMA, ScaLAPACK, LAPACK SUNDIALS: CVODE(S) and IDA(S) multi-step ODE and DAE time integrators (with sensitivities), ARKode multistage IMEX integrator, KINSOL nonlinear solver

90 Your Turn: Collaboration Strategies Scenario: You are part of a new development group, coming together from diverse backgrounds. What specific activities and goals could help the team succeed? 90

91 Your Turn: Collaboration Strategies Scenario: You are part of a new development group, coming together from diverse backgrounds. What specific activities and goals could help the team succeed? Don t assume everything will go fine, just because you are all great people. Catalog existing approaches for each development practices and processes. Identify maturity gaps that may lead to frustration: Example: Someone does not use source management, others do. Develop policies for the new team. Declare expected behaviors, practices. Be rigorous and specific. Establish a plan for all team members to conform to new team policies. Track progress. 91

92 Final Take-Away Points Some knowledge of data structures is important. Serve as reference for other data structures. Example: Compressed Row Storage. Trilinos is investing heavily in algorithms and software for next-generation systems. We are fortunate to have good funding to explore and implement next-gen. Your investment in next-gen capabilities should wait until you are ready. Prepare by learning fundamental concepts, while library software matures. Trilinos, and now xsdk, are large collections of scientific libraries. Lots to learn, but lots of capabilities. Integrated support, as your modeling and simulation matures. There s a Library for That: Most linear algebra problems can be solved using a robust, parallel library. Many other libraries: Become a software archeologist first! Even if you need to write your own, you can use libraries for some steps. 92

93 After You Leave Visit the Trilinos Tutorial site: OnTutorial Use the web portal: /c++/index.html Visit the xsdk site: 93

The Trilinos Project Exascale Roadmap Michael A. Heroux Sandia National Laboratories

The Trilinos Project Exascale Roadmap Michael A. Heroux Sandia National Laboratories The Trilinos Project Exascale Roadmap Michael A. Heroux Sandia National Laboratories Contributors: Mark Hoemmen, Siva Rajamanickam, Tobias Wiesner, Alicia Klinvex, Heidi Thornquist, Kate Evans, Andrey

More information

Parallel PDE Solvers in Python

Parallel PDE Solvers in Python Parallel PDE Solvers in Python Bill Spotz Sandia National Laboratories Scientific Python 2006 August 18, 2006 Computational Sciences at Sandia Chemically reacting flows Climate modeling Combustion Compressible

More information

Using Trilinos Linear Solvers

Using Trilinos Linear Solvers 1 Using Trilinos Linear Solvers Sandia is a multiprogram laboratory managed and operated by Sandia Corporation, a wholly! owned subsidiary of Lockheed Martin Corporation, for the U. S. Department of Energy

More information

Teko: A Package for Multiphysics Preconditioners

Teko: A Package for Multiphysics Preconditioners SAND 2009-7242P Teko: A Package for Multiphysics Preconditioners Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energyʼs

More information

An Overview of Trilinos

An Overview of Trilinos SAND REPORT SAND2003-2927 Unlimited Release Printed August 2003 An Overview of Trilinos Michael Heroux, Roscoe Bartlett, Vicki Howle Robert Hoekstra, Jonathan Hu, Tamara Kolda, Richard Lehoucq, Kevin Long,

More information

PyTrilinos: A Python Interface to Trilinos, a Set of Object-Oriented Solver Packages

PyTrilinos: A Python Interface to Trilinos, a Set of Object-Oriented Solver Packages PyTrilinos: A Python Interface to Trilinos, a Set of Object-Oriented Solver Packages Bill Spotz Sandia National Laboratories SciPy 2005 Pasadena, CA 22 Sep 2005 With special thanks to Marzio Sala, Eric

More information

MPI Related Software

MPI Related Software 1 MPI Related Software Profiling Libraries and Tools Visualizing Program Behavior Timing Performance Measurement and Tuning High Level Libraries Profiling Libraries MPI provides mechanism to intercept

More information

Trilinos Overview. Michael A. Heroux. Michael A. Heroux Sandia National Laboratories with contributions from many collaborators

Trilinos Overview. Michael A. Heroux. Michael A. Heroux Sandia National Laboratories with contributions from many collaborators Trilinos Overview Michael A. Heroux Michael A. Heroux Sandia National Laboratories with contributions from many collaborators 1 Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed

More information

ForTrilinos: Bringing Trilinos to Object- Oriented Fortran Parallel Applica9ons

ForTrilinos: Bringing Trilinos to Object- Oriented Fortran Parallel Applica9ons ForTrilinos: Bringing Trilinos to Object- Oriented Fortran Parallel Applica9ons Karla Morris Sandia Na.onal Laboratories European Trilinos User Group Mee.ng EPFL Lausanne, Switzerland June 5, 2012 SAND

More information

MPI Related Software. Profiling Libraries. Performance Visualization with Jumpshot

MPI Related Software. Profiling Libraries. Performance Visualization with Jumpshot 1 MPI Related Software Profiling Libraries and Tools Visualizing Program Behavior Timing Performance Measurement and Tuning High Level Libraries Performance Visualization with Jumpshot For detailed analysis

More information

Integration of Trilinos Into The Cactus Code Framework

Integration of Trilinos Into The Cactus Code Framework Integration of Trilinos Into The Cactus Code Framework Josh Abadie Research programmer Center for Computation & Technology Louisiana State University Summary Motivation Objectives The Cactus Code Trilinos

More information

GTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS. Kyle Spagnoli. Research EM Photonics 3/20/2013

GTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS. Kyle Spagnoli. Research EM Photonics 3/20/2013 GTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS Kyle Spagnoli Research Engineer @ EM Photonics 3/20/2013 INTRODUCTION» Sparse systems» Iterative solvers» High level benchmarks»

More information

Dynamic Selection of Auto-tuned Kernels to the Numerical Libraries in the DOE ACTS Collection

Dynamic Selection of Auto-tuned Kernels to the Numerical Libraries in the DOE ACTS Collection Numerical Libraries in the DOE ACTS Collection The DOE ACTS Collection SIAM Parallel Processing for Scientific Computing, Savannah, Georgia Feb 15, 2012 Tony Drummond Computational Research Division Lawrence

More information

MAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures

MAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures MAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures Stan Tomov Innovative Computing Laboratory University of Tennessee, Knoxville OLCF Seminar Series, ORNL June 16, 2010

More information

Preconditioning Linear Systems Arising from Graph Laplacians of Complex Networks

Preconditioning Linear Systems Arising from Graph Laplacians of Complex Networks Preconditioning Linear Systems Arising from Graph Laplacians of Complex Networks Kevin Deweese 1 Erik Boman 2 1 Department of Computer Science University of California, Santa Barbara 2 Scalable Algorithms

More information

AmgX 2.0: Scaling toward CORAL Joe Eaton, November 19, 2015

AmgX 2.0: Scaling toward CORAL Joe Eaton, November 19, 2015 AmgX 2.0: Scaling toward CORAL Joe Eaton, November 19, 2015 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for the future 2 AmgX Fast, scalable linear solvers, emphasis on iterative

More information

MAGMA. Matrix Algebra on GPU and Multicore Architectures

MAGMA. Matrix Algebra on GPU and Multicore Architectures MAGMA Matrix Algebra on GPU and Multicore Architectures Innovative Computing Laboratory Electrical Engineering and Computer Science University of Tennessee Piotr Luszczek (presenter) web.eecs.utk.edu/~luszczek/conf/

More information

CUDA Accelerated Compute Libraries. M. Naumov

CUDA Accelerated Compute Libraries. M. Naumov CUDA Accelerated Compute Libraries M. Naumov Outline Motivation Why should you use libraries? CUDA Toolkit Libraries Overview of performance CUDA Proprietary Libraries Address specific markets Third Party

More information

Overview of Trilinos and PT-Scotch

Overview of Trilinos and PT-Scotch 29.03.2012 Outline PT-Scotch 1 PT-Scotch The Dual Recursive Bipartitioning Algorithm Parallel Graph Bipartitioning Methods 2 Overview of the Trilinos Packages Examples on using Trilinos PT-Scotch The Scotch

More information

Michal Merta. 17th May 2012

Michal Merta. 17th May 2012 PRACE Spring School 2012, Krakow, Poland 17th May 2012 1 2 3 4 The Trilinos Project Trilinos = a string of pearls Software for the solution of large-scale engineering and scientific problems Open, package-based

More information

Iterative Sparse Triangular Solves for Preconditioning

Iterative Sparse Triangular Solves for Preconditioning Euro-Par 2015, Vienna Aug 24-28, 2015 Iterative Sparse Triangular Solves for Preconditioning Hartwig Anzt, Edmond Chow and Jack Dongarra Incomplete Factorization Preconditioning Incomplete LU factorizations

More information

NEW ADVANCES IN GPU LINEAR ALGEBRA

NEW ADVANCES IN GPU LINEAR ALGEBRA GTC 2012: NEW ADVANCES IN GPU LINEAR ALGEBRA Kyle Spagnoli EM Photonics 5/16/2012 QUICK ABOUT US» HPC/GPU Consulting Firm» Specializations in:» Electromagnetics» Image Processing» Fluid Dynamics» Linear

More information

ACCELERATING CFD AND RESERVOIR SIMULATIONS WITH ALGEBRAIC MULTI GRID Chris Gottbrath, Nov 2016

ACCELERATING CFD AND RESERVOIR SIMULATIONS WITH ALGEBRAIC MULTI GRID Chris Gottbrath, Nov 2016 ACCELERATING CFD AND RESERVOIR SIMULATIONS WITH ALGEBRAIC MULTI GRID Chris Gottbrath, Nov 2016 Challenges What is Algebraic Multi-Grid (AMG)? AGENDA Why use AMG? When to use AMG? NVIDIA AmgX Results 2

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 20: Sparse Linear Systems; Direct Methods vs. Iterative Methods Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 26

More information

Accelerating the Iterative Linear Solver for Reservoir Simulation

Accelerating the Iterative Linear Solver for Reservoir Simulation Accelerating the Iterative Linear Solver for Reservoir Simulation Wei Wu 1, Xiang Li 2, Lei He 1, Dongxiao Zhang 2 1 Electrical Engineering Department, UCLA 2 Department of Energy and Resources Engineering,

More information

Lecture 15: More Iterative Ideas

Lecture 15: More Iterative Ideas Lecture 15: More Iterative Ideas David Bindel 15 Mar 2010 Logistics HW 2 due! Some notes on HW 2. Where we are / where we re going More iterative ideas. Intro to HW 3. More HW 2 notes See solution code!

More information

Intel Math Kernel Library 10.3

Intel Math Kernel Library 10.3 Intel Math Kernel Library 10.3 Product Brief Intel Math Kernel Library 10.3 The Flagship High Performance Computing Math Library for Windows*, Linux*, and Mac OS* X Intel Math Kernel Library (Intel MKL)

More information

Performance of deal.ii on a node

Performance of deal.ii on a node Performance of deal.ii on a node Bruno Turcksin Texas A&M University, Dept. of Mathematics Bruno Turcksin Deal.II on a node 1/37 Outline 1 Introduction 2 Architecture 3 Paralution 4 Other Libraries 5 Conclusions

More information

EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI

EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI 1 Akshay N. Panajwar, 2 Prof.M.A.Shah Department of Computer Science and Engineering, Walchand College of Engineering,

More information

Epetra Performance Optimization Guide

Epetra Performance Optimization Guide SAND2005-1668 Unlimited elease Printed March 2005 Updated for Trilinos 9.0 February 2009 Epetra Performance Optimization Guide Michael A. Heroux Scalable Algorithms Department Sandia National Laboratories

More information

Self Adapting Numerical Software (SANS-Effort)

Self Adapting Numerical Software (SANS-Effort) Self Adapting Numerical Software (SANS-Effort) Jack Dongarra Innovative Computing Laboratory University of Tennessee and Oak Ridge National Laboratory 1 Work on Self Adapting Software 1. Lapack For Clusters

More information

HPC future trends from a science perspective

HPC future trends from a science perspective HPC future trends from a science perspective Simon McIntosh-Smith University of Bristol HPC Research Group simonm@cs.bris.ac.uk 1 Business as usual? We've all got used to new machines being relatively

More information

How to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić

How to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić How to perform HPL on CPU&GPU clusters Dr.sc. Draško Tomić email: drasko.tomic@hp.com Forecasting is not so easy, HPL benchmarking could be even more difficult Agenda TOP500 GPU trends Some basics about

More information

Preparation of Codes for Trinity

Preparation of Codes for Trinity Preparation of Codes for Trinity Courtenay T. Vaughan, Mahesh Rajan, Dennis C. Dinge, Clark R. Dohrmann, Micheal W. Glass, Kenneth J. Franko, Kendall H. Pierson, and Michael R. Tupek Sandia National Laboratories

More information

Dynamic Cuda with F# HPC GPU & F# Meetup. March 19. San Jose, California

Dynamic Cuda with F# HPC GPU & F# Meetup. March 19. San Jose, California Dynamic Cuda with F# HPC GPU & F# Meetup March 19 San Jose, California Dr. Daniel Egloff daniel.egloff@quantalea.net +41 44 520 01 17 +41 79 430 03 61 About Us! Software development and consulting company!

More information

A parallel direct/iterative solver based on a Schur complement approach

A parallel direct/iterative solver based on a Schur complement approach A parallel direct/iterative solver based on a Schur complement approach Gene around the world at CERFACS Jérémie Gaidamour LaBRI and INRIA Bordeaux - Sud-Ouest (ScAlApplix project) February 29th, 2008

More information

Study and implementation of computational methods for Differential Equations in heterogeneous systems. Asimina Vouronikoy - Eleni Zisiou

Study and implementation of computational methods for Differential Equations in heterogeneous systems. Asimina Vouronikoy - Eleni Zisiou Study and implementation of computational methods for Differential Equations in heterogeneous systems Asimina Vouronikoy - Eleni Zisiou Outline Introduction Review of related work Cyclic Reduction Algorithm

More information

(Sparse) Linear Solvers

(Sparse) Linear Solvers (Sparse) Linear Solvers Ax = B Why? Many geometry processing applications boil down to: solve one or more linear systems Parameterization Editing Reconstruction Fairing Morphing 2 Don t you just invert

More information

The Arm Technology Ecosystem: Current Products and Future Outlook

The Arm Technology Ecosystem: Current Products and Future Outlook The Arm Technology Ecosystem: Current Products and Future Outlook Dan Ernst, PhD Advanced Technology Cray, Inc. Why is an Ecosystem Important? An Ecosystem is a collection of common material Developed

More information

Portable and Productive Performance on Hybrid Systems with libsci_acc Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc.

Portable and Productive Performance on Hybrid Systems with libsci_acc Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc. Portable and Productive Performance on Hybrid Systems with libsci_acc Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc. 1 What is Cray Libsci_acc? Provide basic scientific

More information

A User Perspective on Autotuning for Scalable Multicore Systems Michael A. Heroux Sandia National Laboratories

A User Perspective on Autotuning for Scalable Multicore Systems Michael A. Heroux Sandia National Laboratories A User Perspective on Autotuning for Scalable Multicore Systems Michael A. Heroux Sandia National Laboratories Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company,

More information

Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU. Robert Strzodka NVAMG Project Lead

Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU. Robert Strzodka NVAMG Project Lead Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU Robert Strzodka NVAMG Project Lead A Parallel Success Story in Five Steps 2 Step 1: Understand Application ANSYS Fluent Computational Fluid Dynamics

More information

Ifpack2 User s Guide 1.0 (Trilinos version 12.6)

Ifpack2 User s Guide 1.0 (Trilinos version 12.6) SANDIA REPORT SAND2016-5338 Unlimited Release Printed June 2016 Ifpack2 User s Guide 1.0 (Trilinos version 12.6) Andrey Prokopenko, Christopher M. Siefert, Jonathan J. Hu, Mark Hoemmen, Alicia Klinvex

More information

PETSc Satish Balay, Kris Buschelman, Bill Gropp, Dinesh Kaushik, Lois McInnes, Barry Smith

PETSc   Satish Balay, Kris Buschelman, Bill Gropp, Dinesh Kaushik, Lois McInnes, Barry Smith PETSc http://www.mcs.anl.gov/petsc Satish Balay, Kris Buschelman, Bill Gropp, Dinesh Kaushik, Lois McInnes, Barry Smith PDE Application Codes PETSc PDE Application Codes! ODE Integrators! Nonlinear Solvers,!

More information

Portability and Scalability of Sparse Tensor Decompositions on CPU/MIC/GPU Architectures

Portability and Scalability of Sparse Tensor Decompositions on CPU/MIC/GPU Architectures Photos placed in horizontal position with even amount of white space between photos and header Portability and Scalability of Sparse Tensor Decompositions on CPU/MIC/GPU Architectures Christopher Forster,

More information

Sampling Using GPU Accelerated Sparse Hierarchical Models

Sampling Using GPU Accelerated Sparse Hierarchical Models Sampling Using GPU Accelerated Sparse Hierarchical Models Miroslav Stoyanov Oak Ridge National Laboratory supported by Exascale Computing Project (ECP) exascaleproject.org April 9, 28 Miroslav Stoyanov

More information

Parallel resolution of sparse linear systems by mixing direct and iterative methods

Parallel resolution of sparse linear systems by mixing direct and iterative methods Parallel resolution of sparse linear systems by mixing direct and iterative methods Phyleas Meeting, Bordeaux J. Gaidamour, P. Hénon, J. Roman, Y. Saad LaBRI and INRIA Bordeaux - Sud-Ouest (ScAlApplix

More information

Speedup Altair RADIOSS Solvers Using NVIDIA GPU

Speedup Altair RADIOSS Solvers Using NVIDIA GPU Innovation Intelligence Speedup Altair RADIOSS Solvers Using NVIDIA GPU Eric LEQUINIOU, HPC Director Hongwei Zhou, Senior Software Developer May 16, 2012 Innovation Intelligence ALTAIR OVERVIEW Altair

More information

Intel Performance Libraries

Intel Performance Libraries Intel Performance Libraries Powerful Mathematical Library Intel Math Kernel Library (Intel MKL) Energy Science & Research Engineering Design Financial Analytics Signal Processing Digital Content Creation

More information

MAGMA: a New Generation

MAGMA: a New Generation 1.3 MAGMA: a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures Jack Dongarra T. Dong, M. Gates, A. Haidar, S. Tomov, and I. Yamazaki University of Tennessee, Knoxville Release

More information

Performance Analysis of BLAS Libraries in SuperLU_DIST for SuperLU_MCDT (Multi Core Distributed) Development

Performance Analysis of BLAS Libraries in SuperLU_DIST for SuperLU_MCDT (Multi Core Distributed) Development Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Performance Analysis of BLAS Libraries in SuperLU_DIST for SuperLU_MCDT (Multi Core Distributed) Development M. Serdar Celebi

More information

Install your scientific software stack easily with Spack

Install your scientific software stack easily with Spack Install your scientific software stack easily with Spack Les mardis du développement technologique Florent Pruvost (SED) Outline 1. Context 2. Features overview 3. In practice 4. Some feedback Florent

More information

Multi-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation

Multi-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation Multi-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation 1 Cheng-Han Du* I-Hsin Chung** Weichung Wang* * I n s t i t u t e o f A p p l i e d M

More information

CSE 599 I Accelerated Computing - Programming GPUS. Parallel Pattern: Sparse Matrices

CSE 599 I Accelerated Computing - Programming GPUS. Parallel Pattern: Sparse Matrices CSE 599 I Accelerated Computing - Programming GPUS Parallel Pattern: Sparse Matrices Objective Learn about various sparse matrix representations Consider how input data affects run-time performance of

More information

GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE)

GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE) GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE) NATALIA GIMELSHEIN ANSHUL GUPTA STEVE RENNICH SEID KORIC NVIDIA IBM NVIDIA NCSA WATSON SPARSE MATRIX PACKAGE (WSMP) Cholesky, LDL T, LU factorization

More information

Supercomputing and Science An Introduction to High Performance Computing

Supercomputing and Science An Introduction to High Performance Computing Supercomputing and Science An Introduction to High Performance Computing Part VII: Scientific Computing Henry Neeman, Director OU Supercomputing Center for Education & Research Outline Scientific Computing

More information

High Performance Computing: Tools and Applications

High Performance Computing: Tools and Applications High Performance Computing: Tools and Applications Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology Lecture 15 Numerically solve a 2D boundary value problem Example:

More information

Solving Dense Linear Systems on Graphics Processors

Solving Dense Linear Systems on Graphics Processors Solving Dense Linear Systems on Graphics Processors Sergio Barrachina Maribel Castillo Francisco Igual Rafael Mayo Enrique S. Quintana-Ortí High Performance Computing & Architectures Group Universidad

More information

ML 3.1 Smoothed Aggregation User s Guide

ML 3.1 Smoothed Aggregation User s Guide SAND2004 4819 Unlimited Release Printed September 2004 ML 3.1 Smoothed Aggregation User s Guide Marzio Sala Computational Math & Algorithms Sandia National Laboratories P.O. Box 5800 Albuquerque, NM 87185-1110

More information

Early Experiences with Trinity - The First Advanced Technology Platform for the ASC Program

Early Experiences with Trinity - The First Advanced Technology Platform for the ASC Program Early Experiences with Trinity - The First Advanced Technology Platform for the ASC Program C.T. Vaughan, D.C. Dinge, P.T. Lin, S.D. Hammond, J. Cook, C. R. Trott, A.M. Agelastos, D.M. Pase, R.E. Benner,

More information

Harp-DAAL for High Performance Big Data Computing

Harp-DAAL for High Performance Big Data Computing Harp-DAAL for High Performance Big Data Computing Large-scale data analytics is revolutionizing many business and scientific domains. Easy-touse scalable parallel techniques are necessary to process big

More information

What is DARMA? DARMA is a C++ abstraction layer for asynchronous many-task (AMT) runtimes.

What is DARMA? DARMA is a C++ abstraction layer for asynchronous many-task (AMT) runtimes. DARMA Janine C. Bennett, Jonathan Lifflander, David S. Hollman, Jeremiah Wilke, Hemanth Kolla, Aram Markosyan, Nicole Slattengren, Robert L. Clay (PM) PSAAP-WEST February 22, 2017 Sandia National Laboratories

More information

Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers

Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers Markus Geveler, Dirk Ribbrock, Dominik Göddeke, Peter Zajac, Stefan Turek Institut für Angewandte Mathematik TU Dortmund,

More information

MueLu - AMG Design and Extensibility

MueLu - AMG Design and Extensibility MueLu - AMG Design and Extensibility Tobias Wiesner Andrey Prokopenko Jonathan Hu Sandia National Labs March 3, 2015 Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia

More information

Report of Linear Solver Implementation on GPU

Report of Linear Solver Implementation on GPU Report of Linear Solver Implementation on GPU XIANG LI Abstract As the development of technology and the linear equation solver is used in many aspects such as smart grid, aviation and chemical engineering,

More information

Efficient Multi-GPU CUDA Linear Solvers for OpenFOAM

Efficient Multi-GPU CUDA Linear Solvers for OpenFOAM Efficient Multi-GPU CUDA Linear Solvers for OpenFOAM Alexander Monakov, amonakov@ispras.ru Institute for System Programming of Russian Academy of Sciences March 20, 2013 1 / 17 Problem Statement In OpenFOAM,

More information

Accelerating the Conjugate Gradient Algorithm with GPUs in CFD Simulations

Accelerating the Conjugate Gradient Algorithm with GPUs in CFD Simulations Accelerating the Conjugate Gradient Algorithm with GPUs in CFD Simulations Hartwig Anzt 1, Marc Baboulin 2, Jack Dongarra 1, Yvan Fournier 3, Frank Hulsemann 3, Amal Khabou 2, and Yushan Wang 2 1 University

More information

Summer 2009 REU: Introduction to Some Advanced Topics in Computational Mathematics

Summer 2009 REU: Introduction to Some Advanced Topics in Computational Mathematics Summer 2009 REU: Introduction to Some Advanced Topics in Computational Mathematics Moysey Brio & Paul Dostert July 4, 2009 1 / 18 Sparse Matrices In many areas of applied mathematics and modeling, one

More information

Advances in Parallel Partitioning, Load Balancing and Matrix Ordering for Scientific Computing

Advances in Parallel Partitioning, Load Balancing and Matrix Ordering for Scientific Computing Advances in Parallel Partitioning, Load Balancing and Matrix Ordering for Scientific Computing Erik G. Boman 1, Umit V. Catalyurek 2, Cédric Chevalier 1, Karen D. Devine 1, Ilya Safro 3, Michael M. Wolf

More information

PARDISO - PARallel DIrect SOlver to solve SLAE on shared memory architectures

PARDISO - PARallel DIrect SOlver to solve SLAE on shared memory architectures PARDISO - PARallel DIrect SOlver to solve SLAE on shared memory architectures Solovev S. A, Pudov S.G sergey.a.solovev@intel.com, sergey.g.pudov@intel.com Intel Xeon, Intel Core 2 Duo are trademarks of

More information

Intel Math Kernel Library (Intel MKL) BLAS. Victor Kostin Intel MKL Dense Solvers team manager

Intel Math Kernel Library (Intel MKL) BLAS. Victor Kostin Intel MKL Dense Solvers team manager Intel Math Kernel Library (Intel MKL) BLAS Victor Kostin Intel MKL Dense Solvers team manager Intel MKL BLAS/Sparse BLAS Original ( dense ) BLAS available from www.netlib.org Additionally Intel MKL provides

More information

Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs

Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs Markus Geveler, Dirk Ribbrock, Dominik Göddeke, Peter Zajac, Stefan Turek Institut für Angewandte Mathematik TU Dortmund,

More information

Chris Baker. Mike Heroux Mike Parks Heidi Thornquist (Lead)

Chris Baker. Mike Heroux Mike Parks Heidi Thornquist (Lead) Belos: Next-Generation Iterative e Solvers 2009 Trilinos User Group Meeting November 4, 2009 Chris Baker David Day Mike Heroux Mike Parks Heidi Thornquist (Lead) SAND 2009-7105P Sandia is a multiprogram

More information

Parallel Implementations of Gaussian Elimination

Parallel Implementations of Gaussian Elimination s of Western Michigan University vasilije.perovic@wmich.edu January 27, 2012 CS 6260: in Parallel Linear systems of equations General form of a linear system of equations is given by a 11 x 1 + + a 1n

More information

MAGMA. LAPACK for GPUs. Stan Tomov Research Director Innovative Computing Laboratory Department of Computer Science University of Tennessee, Knoxville

MAGMA. LAPACK for GPUs. Stan Tomov Research Director Innovative Computing Laboratory Department of Computer Science University of Tennessee, Knoxville MAGMA LAPACK for GPUs Stan Tomov Research Director Innovative Computing Laboratory Department of Computer Science University of Tennessee, Knoxville Keeneland GPU Tutorial 2011, Atlanta, GA April 14-15,

More information

Lecture 9. Introduction to Numerical Techniques

Lecture 9. Introduction to Numerical Techniques Lecture 9. Introduction to Numerical Techniques Ivan Papusha CDS270 2: Mathematical Methods in Control and System Engineering May 27, 2015 1 / 25 Logistics hw8 (last one) due today. do an easy problem

More information

LAPACK. Linear Algebra PACKage. Janice Giudice David Knezevic 1

LAPACK. Linear Algebra PACKage. Janice Giudice David Knezevic 1 LAPACK Linear Algebra PACKage 1 Janice Giudice David Knezevic 1 Motivating Question Recalling from last week... Level 1 BLAS: vectors ops Level 2 BLAS: matrix-vectors ops 2 2 O( n ) flops on O( n ) data

More information

Ilya Lashuk, Merico Argentati, Evgenii Ovtchinnikov, Andrew Knyazev (speaker)

Ilya Lashuk, Merico Argentati, Evgenii Ovtchinnikov, Andrew Knyazev (speaker) Ilya Lashuk, Merico Argentati, Evgenii Ovtchinnikov, Andrew Knyazev (speaker) Department of Mathematics and Center for Computational Mathematics University of Colorado at Denver SIAM Conference on Parallel

More information

Portable Heterogeneous High-Performance Computing via Domain-Specific Virtualization. Dmitry I. Lyakh.

Portable Heterogeneous High-Performance Computing via Domain-Specific Virtualization. Dmitry I. Lyakh. Portable Heterogeneous High-Performance Computing via Domain-Specific Virtualization Dmitry I. Lyakh liakhdi@ornl.gov This research used resources of the Oak Ridge Leadership Computing Facility at the

More information

On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators

On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators Karl Rupp, Barry Smith rupp@mcs.anl.gov Mathematics and Computer Science Division Argonne National Laboratory FEMTEC

More information

Cisco Unified Computing System Delivering on Cisco's Unified Computing Vision

Cisco Unified Computing System Delivering on Cisco's Unified Computing Vision Cisco Unified Computing System Delivering on Cisco's Unified Computing Vision At-A-Glance Unified Computing Realized Today, IT organizations assemble their data center environments from individual components.

More information

Practical High Performance Computing

Practical High Performance Computing Practical High Performance Computing Donour Sizemore July 21, 2005 2005 ICE Purpose of This Talk Define High Performance computing Illustrate how to get started 2005 ICE 1 Preliminaries What is high performance

More information

Computational Aspects and Recent Improvements in the Open-Source Multibody Analysis Software MBDyn

Computational Aspects and Recent Improvements in the Open-Source Multibody Analysis Software MBDyn Computational Aspects and Recent Improvements in the Open-Source Multibody Analysis Software MBDyn Pierangelo Masarati, Marco Morandini, Giuseppe Quaranta and Paolo Mantegazza Dipartimento di Ingegneria

More information

Solvers and partitioners in the Bacchus project

Solvers and partitioners in the Bacchus project 1 Solvers and partitioners in the Bacchus project 11/06/2009 François Pellegrini INRIA-UIUC joint laboratory The Bacchus team 2 Purpose Develop and validate numerical methods and tools adapted to problems

More information

Trends in HPC (hardware complexity and software challenges)

Trends in HPC (hardware complexity and software challenges) Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18

More information

Mathematical Libraries and Application Software on JUQUEEN and JURECA

Mathematical Libraries and Application Software on JUQUEEN and JURECA Mitglied der Helmholtz-Gemeinschaft Mathematical Libraries and Application Software on JUQUEEN and JURECA JSC Training Course May 2017 I.Gutheil Outline General Informations Sequential Libraries Parallel

More information

PARALUTION - a Library for Iterative Sparse Methods on CPU and GPU

PARALUTION - a Library for Iterative Sparse Methods on CPU and GPU - a Library for Iterative Sparse Methods on CPU and GPU Dimitar Lukarski Division of Scientific Computing Department of Information Technology Uppsala Programming for Multicore Architectures Research Center

More information

OpenACC Course. Office Hour #2 Q&A

OpenACC Course. Office Hour #2 Q&A OpenACC Course Office Hour #2 Q&A Q1: How many threads does each GPU core have? A: GPU cores execute arithmetic instructions. Each core can execute one single precision floating point instruction per cycle

More information

ESPRESO ExaScale PaRallel FETI Solver. Hybrid FETI Solver Report

ESPRESO ExaScale PaRallel FETI Solver. Hybrid FETI Solver Report ESPRESO ExaScale PaRallel FETI Solver Hybrid FETI Solver Report Lubomir Riha, Tomas Brzobohaty IT4Innovations Outline HFETI theory from FETI to HFETI communication hiding and avoiding techniques our new

More information

Algorithms, System and Data Centre Optimisation for Energy Efficient HPC

Algorithms, System and Data Centre Optimisation for Energy Efficient HPC 2015-09-14 Algorithms, System and Data Centre Optimisation for Energy Efficient HPC Vincent Heuveline URZ Computing Centre of Heidelberg University EMCL Engineering Mathematics and Computing Lab 1 Energy

More information

Mathematical Libraries and Application Software on JUQUEEN and JURECA

Mathematical Libraries and Application Software on JUQUEEN and JURECA Mitglied der Helmholtz-Gemeinschaft Mathematical Libraries and Application Software on JUQUEEN and JURECA JSC Training Course November 2015 I.Gutheil Outline General Informations Sequential Libraries Parallel

More information

Evaluation of Intel Memory Drive Technology Performance for Scientific Applications

Evaluation of Intel Memory Drive Technology Performance for Scientific Applications Evaluation of Intel Memory Drive Technology Performance for Scientific Applications Vladimir Mironov, Andrey Kudryavtsev, Yuri Alexeev, Alexander Moskovsky, Igor Kulikov, and Igor Chernykh Introducing

More information

Optimising the Mantevo benchmark suite for multi- and many-core architectures

Optimising the Mantevo benchmark suite for multi- and many-core architectures Optimising the Mantevo benchmark suite for multi- and many-core architectures Simon McIntosh-Smith Department of Computer Science University of Bristol 1 Bristol's rich heritage in HPC The University of

More information

Solving Sparse Linear Systems. Forward and backward substitution for solving lower or upper triangular systems

Solving Sparse Linear Systems. Forward and backward substitution for solving lower or upper triangular systems AMSC 6 /CMSC 76 Advanced Linear Numerical Analysis Fall 7 Direct Solution of Sparse Linear Systems and Eigenproblems Dianne P. O Leary c 7 Solving Sparse Linear Systems Assumed background: Gauss elimination

More information

Enabling Next-Generation Parallel Circuit Simulation with Trilinos

Enabling Next-Generation Parallel Circuit Simulation with Trilinos Enabling Next-Generation Parallel Circuit Simulation with Trilinos Chris Baker 1, Erik Boman 2, Mike Heroux 2, Eric Keiter 2, Siva Rajamanickam 2, Rich Schiek 2, and Heidi Thornquist 2 1 Oak Ridge National

More information

Distributed NVAMG. Design and Implementation of a Scalable Algebraic Multigrid Framework for a Cluster of GPUs

Distributed NVAMG. Design and Implementation of a Scalable Algebraic Multigrid Framework for a Cluster of GPUs Distributed NVAMG Design and Implementation of a Scalable Algebraic Multigrid Framework for a Cluster of GPUs Istvan Reguly (istvan.reguly at oerc.ox.ac.uk) Oxford e-research Centre NVIDIA Summer Internship

More information

First Steps of YALES2 Code Towards GPU Acceleration on Standard and Prototype Cluster

First Steps of YALES2 Code Towards GPU Acceleration on Standard and Prototype Cluster First Steps of YALES2 Code Towards GPU Acceleration on Standard and Prototype Cluster YALES2: Semi-industrial code for turbulent combustion and flows Jean-Matthieu Etancelin, ROMEO, NVIDIA GPU Application

More information

High Performance Computing Software Development Kit For Mac OS X In Depth Product Information

High Performance Computing Software Development Kit For Mac OS X In Depth Product Information High Performance Computing Software Development Kit For Mac OS X In Depth Product Information 2781 Bond Street Rochester Hills, MI 48309 U.S.A. Tel (248) 853-0095 Fax (248) 853-0108 support@absoft.com

More information

Tools and Primitives for High Performance Graph Computation

Tools and Primitives for High Performance Graph Computation Tools and Primitives for High Performance Graph Computation John R. Gilbert University of California, Santa Barbara Aydin Buluç (LBNL) Adam Lugowski (UCSB) SIAM Minisymposium on Analyzing Massive Real-World

More information

(Sparse) Linear Solvers

(Sparse) Linear Solvers (Sparse) Linear Solvers Ax = B Why? Many geometry processing applications boil down to: solve one or more linear systems Parameterization Editing Reconstruction Fairing Morphing 1 Don t you just invert

More information