Managing Complexity in the Parallel Sparse Grid Combination Technique

Size: px

Start display at page:

Download "Managing Complexity in the Parallel Sparse Grid Combination Technique"

Lorena Taylor
5 years ago
Views:

Southern 3 1 Mathematical Sciences Institute, The Australian National University 2 Research

1 Managing Complexity in the Parallel Sparse Grid Combination Technique J. W. Larson 1 P. E. Strazdins 2 M. Hegland 1 B. Harding 1 S. Roberts 1 L. Stals 1 A. P. Rendell 2 M. Ali 2 J. Southern 3 1 Mathematical Sciences Institute, The Australian National University 2 Research School of Computer Science, The Australian National University 3 Fujitsu Laboratories, Europe July 19, 2016

2 outline the emerging hpc landscape Ultraproblems at Ultrascale Faults and Fault-Tolerant Techniques (FT) understanding complexity so we can manage it Sparse Grids The Sparse Grid Combination Technique Complexity Metrics Implications for the Parallel SGCT managing complexity Numerical MapReduce Framework (NuMRF) Parallel SGCT Implementation

3 part i: the emerging hpc landscape

will become routine for large-scale applications running on these platforms This means.

4 the road from petascale to exascale Ultra-parallelism > O(10 6 ) cores High scaling efficiency required Large number of hardware components, each with a finite probability of failure Hardware faults such as node failures will become routine for large-scale applications running on these platforms This means... Future ultrascale applications must embody FT Run successfully through node failures Recover from faults or at least checkpoint/exit gracefully

fault recovery and fault-tolerance Technological approaches: Replication/redundancy Runtime checkpointing with recovery through restart/task reassignment Runtime recreation of lost data using

5 fault recovery and fault-tolerance Technological approaches: Replication/redundancy Runtime checkpointing with recovery through restart/task reassignment Runtime recreation of lost data using neighboring data...computational techniques for one mill...billion processing elements! Algorithm-based FT (ABFT): Huang and Abraham (1984): row/column checksums to correct for computational errors Du et al. (2012): checksum-based fail/stop in to LU & QR decompositions Liu (2002); Geist and Engleman (2007): chaotic relaxation Dean and Ghemawat (2004): MapReduce Our group: sparse grid combination method with built-in runtime fault-tolerance

6 part ii: understanding complexity so we can manage it

what is a sparse grid? A solution to a complexity problem: The number of gridpoints on a d-dimensional isotropic grid grows exponentially w.r.t. d This is the curse of dimensionality A sparse grid

7 what is a sparse grid? A solution to a complexity problem: The number of gridpoints on a d-dimensional isotropic grid grows exponentially w.r.t. d This is the curse of dimensionality A sparse grid provides fine-scale resolution in each dimension, but not combined fine scales from all multidimensional subspaces Constructed from a number of coarser component grids that are fine-scale in some dimensions but coarse in others Developed to solve problems in high dimensions

8 sparse grids reduce problem size dramatically F 2 Ld S 2 L L d 1 R C = F S ( 2 L L ) d 1

9 geometric definition of sparse grid a simple sparse grid = sparse grid in frequency / scale space = captures fine scales in both dimensions but not joint fine scales

10 constructing sparse grids from the scale lattice-part i Consider a one-dimensional level L 0 equipartition of a closed interval into 2 L segments including boundaries, this partition results in 2 L + 1 grid points Generalize to a closed box domain of dimensionality d with a d-dimensional tensor product grid of the dimensions grids. result is an isotropic full-grid F having F = (2 L + 1) d grid points Suppose instead, we choose for each dimension 1 j d a partition of level 0 l j L result is a component grid G l of level l having G l = d i=1 (2l i + 1) grid points the index vector l defines a point on the scale lattice L of level L, equivalent to a unique gridded partition of the closed d-dimensional box domain

11 constructing sparse grids from the scale lattice-part ii Each point l L defines a d-dimensional grid G l L, with L the set of all grids generated by the scale lattice. Grids with l 1 = l 2 = = l d are called isotropic The full grid F is isotropic with l 1 = l 2 = = l d = L Level L Sparse grid S is the union of a set G L of component grids The definition of G is defined by constraints C( l) on l S = C( l) G l. for example is the classic combination s constraint on the sum of the level indices l 1 l 1 {L, L 1,..., L d + 1}, L d 1.

12 general combination formulae The classic combination solution fl C ( x) for level L in d dimensions is, in terms of the component grid solutions f l ( x) d 1 L ( x) = ( ) d 1 ( 1) q q f C q=0 l 1 =L q f l ( x) Possible to include m L 1 hyperplanes worth of spare component grids for FT. These spare grids are used only in scenarios of loss of one ore classic combination component grids due to fault(s)

13 classic combination and example ft scenarios classic combination loss of (3, 4) loss of (2, 5)

14 building solvers on sparse grids algorithm 1 Pick a set G of multidimensional, coarser component grids 2 Solve on each component grid G l (interpolate to S ) 3 (Linear) Combination of component grids solutions for solution on S 4 Optional: interpolate solution from S to F 5 Time Evolution/Iteration: propagate solution on S back to each G l G Error bounds for solutions on the sparse grid can be computed based on the scheme used on the component grids and the combination method

15 sgct complexity number of component grids d 1 ( ) L k + d 1 G = d 1 k=0 G FT = ( ) L 1 + d 1 ( ) L 2 d 1 This is the number of M N parallel data transfers in the SGCT

16 what is an M N transfer? Data connections for the 2D level 5 SGCT

17 sgct complexity total number of component gridpoints Aggregate memory usage; (very!) crude measure of cost Aggregate data traffic between the components solvers and the solver for S

18 implications for a parallel sgct complexity analysis tells us... Lossy ABFT overhead is low compared to replication High values of (L, d) will engender numerous component grid tasks high grid data volumes many (parallel) data connections routing data to/from the sparse grid Further modeling required using application- and platform-specific information application performance data hardware characteristics: processor speed, switch latency/bandwidth

19 implications for a parallel sgct, cont d... requirements for a parallel sgct system Low-level automation: Distributed grid/field data description Parallel M N transfer G l S Data transformation (specifically, interpolation) Performance measurement/timing Fault detection/reporting High-level automation: Scheduling of iterative execution of large numbers of tasks Load balance based on task cost model (TCM) Probabilistic Fault Detection (PFD) through predicted/elapsed runtime comparison Automatic coordination of large numbers of M N transfers Monitoring/ explicit fault detection Self-steering using an error quality of service (QoS) model to compute alternative solutions in the event of faults Compatibility with legacy science/engineering codes

20 part iii: managing complexity

21 numrf

22 python grids and fields toolkit (PyGrAFT) PyGrAFT is the data language for NuMRF. It is a system for Representing logically Cartesian grids CartGrid class) Arbitrary dimensionality supported Field data residing on these grids (GriddedScalarField) Implemented using NumPy ndarray Arbitrary dimensionality supported Any NumPy base type supported Any number of fields may be associated with a CartGrid Complete flexibility regarding storage order Expressing multi-resolution relationships (FullGrid and ComponentGrid subclasses) Performing combinations involving component grids. Parallelization currently underway At present, there are numerous test examples. Including generation of most of the sparse grid pictures in this talk.

23 c++ parallel sgct implementation Implemented in three C++ classes: GridCombine2D: Overall combination method Vec2D: Field storage ProcGrid2D: Domain decomposition for each grid Level of abstraction reduced code complexity dramatically! Assumptions: Each component grid G l is distributed over a 2D grid of MPI PE s P l Algorithm uses gather-scatter within each grid s pool Load balance computed with an awareness of computational cost; based on component grids respective (fixed) t Implemented using aggressive defensive programming techniques (cross-checking 2D vector calcuations, etc) Robustness (simplest L = 4 case requires 32 processes!) Rapid development Source only about 1000 lines of code Interoperable with NuMRF via CTypes Performance studies just commencing

24 summary conclusions A complexity analysis of the parallel SGCT has been presented Preliminary results have driven a set of requirements for a parallel SGCT system NuMRF, a MapReduce variant, numerical-analysis-friendly, error/fault aware calling framework has been presented Implemtation of NuMRF s data model (PyGrAFT) well underway, prototyping of framework core in progress A robust parallel SGCT has been built and is entering performance analysis and tuning future work Completion of PyGrAFT: Parallelization, M N services, interpolation services, and sparse grid representation Ongoing development of NuMRF s elements.

25 ende DANKE! FRAGEN?

Large-scale Applications made Fault-tolerant using the Sparse Grid Combination Technique

Large-scale Applications made Fault-tolerant using the Sparse Grid Combination Technique Peter Strazdins* and Mohsin Ali Computer Systems Group, Research School of Computer Science, The Australian National