Radial Basis Function-Generated Finite Differences (RBF-FD): New Opportunities for Applications in Scientific Computing Natasha Flyer National Center for Atmospheric Research Boulder, CO
Meshes vs. Mesh-free discretizations Structured meshes: FD, DG, FV, Spectral Elements Requires domain decomposition / curvilinear mappings Unstructured meshes: FEM, DG, FV, Spectral Elements Improved geometric flexibility; requires triangles, tetrahedral, etc. Mesh-free: RBF-FD (Radial basis Func.-generated Finite Differences) Total geometric flexibility; needs just scattered nodes, but no connectivites, e.g. no triangles or mappings
One main evolution path in numerical methods for PDEs: Finite Differences (FD) First general numerical approach for solving PDEs (1910) FD weights obtained by using local polynomial approximations Pseudospectral (PS) Can be seen either as the limit of increasing order FD methods, (1970) or as approximations by basis functions, such as Fourier or Chebyshev; often very accurate, but low geometric flexibility Radial Basis Functions (RBF) Choose instead as basis functions translates of radially (1990) Symmetric functions: PS becomes a special case, but now possible to scatter nodes in any number of dimensions, with no danger of singularities RBF-FD Radial Basis Function-generated FD formulas. All approximations (2003) again local, but nodes can now be placed freely - Easy to achieve high orders of accuracy (4 th to 8 th order) - Excellent for distributed memory computers / GPUs - Local node refinement trivial in any number of dimensions (for ex. in 5+ dimensional math finance applications).
Simplest RBF is the function r, the Euclidean distance All other RBFs are built on this basic concept
RBF idea, In pictures (for 2-D scattered data) Scattered data within a 2-D region Collocate Radial basis functions here rotated Gaussians Find linear combination of the basis functions that fits all the data Linear system that arises can never be singular, no matter how many nodes are scattered in any number of dimensions.
Simplicity of concept in using RBF-FD Ex.: Stencil of n = 21 nodes 1. Calculate distance r = x i x j 2 Result: Distance matrix BUT ALSO Simplest RBF interpolation matrix. Same as centering RBF r at each node & evaluating it at all in from each node to all other nodes in stencil x 1 x 1 2 x 1 x 2 2.... x 1 x n 2 x n x 1 2 x n x 2 2.... x n x n 2 2. For higher-order RBF interp. matrix, For ex. r 3 Element-wise Cube It! x 1 x 1 3 x 1 x 2 3... x 1 x n 3 x n x 1 3 x n x 2 3... x n x n 3
Simplicity of concept in using RBF-FD: Derivative approx. Ex.: Approximate derivative operator L at x c, using the stencil of n = 21 Differentiation weights w k are calculated so the result becomes exact for all PHS RBF interpolants with polynomials evaluated at node locations in the stencil This process is repeated N times, N being the total number of nodes in domain We will be adding polynomials of much higher order Constraints ensure RBF basis reproduces polys. up to the given degree
Shallow water wave equations Simplest equations to describe the evolution of the horizontal structure of a fluid in response to forcings, such as gravity and rotation. Basic Properties Set of nonlinear hyperbolic equations derived from physical conservation laws Horizontal scales of motion >> Vertical scales of motion Vertical velocity and all derivatives in vertical not present It is a 2D model. Areas of Application Atmospheric flows Tsunami prediction Planetary flows Storm surge Dam breaking Netherlands Overflowing Jupiter s atmosphere
GA RBFs
Convergence and Cost Efficiency of RBF-FD on 1 CPU Perfomance on Intel i7 CPU Ref: NCAR SPH Model Ref: RBF-FD NCAR SPH Model: 182,329 SPH bases (30km) RBF-FD gave first evidence that this model, the standard of comparison, was not so accurate. R = Number of subdivisions of each cube face N = Degree of Legendre poly. in each square
Accuracy and Cost Efficiency of RBF-FD on 1 GPU
Performance (GFLOPS) Time Spent in MPI (%) Multi-GPU performance Cirrascale GX-8 system: 8 Nvidia K40 GPUs connected over a PCIe bus 1000 800 600 400 200 0 GPUs 2.6M Node Benchmark 1 2 3 4 5 6 7 8 Number of GPUs 7 6 5 4 3 2 1 0 MPI Communication Overhead 1 2 3 4 5 6 7 8 Number of GPUs Linear scaling approaching 1 TFLOP of performance Only 6% of time spent in MPI communication Cost of GPU syst. $42K, equal in performance to 40 nodes of NCAR supercomputer, cost $250K: 6x cheaper
Performance (GFLOPS) Performance (GFLOPS) Multi CPU and Multi GPU performance: 2.6M nodes on sphere (15km) (Elliott et al., 2017) Latest GPU and CPU architectures for HPC 5000 4500 4000 3500 3000 2500 2000 1500 1000 500 0 NVIDIA PSG P100 Number of GPUs Intel Broadwell CPU 36 cores/node, 72 nodes, 2592 cores 7000 6000 5000 4000 3000 2000 1000 0 4 12 20 28 36 44 52 60 68 Number of Nodes 4.5 Teraflops 6 Teraflops Both are > 100X speedups over the highest achieved performance by the previous single device GPU implementation.
Shallow water wave equations on the sphere: Evolution of a highly unstable wave Day 3: Initial Signs of Instability Day 6: Unstable vortex dynamics
Vorticity at Finite Volume Spectral Element Discontinuous Galerkin RBF-FD Truth 0.35 x 0.35 DG, SE, RBF-FD
2D Compressible Navier-Stokes Accurate time evolution of Temperature Basis functions used: RBF r 5 + Up to 4 th degree polys. Hyperviscosity use GA-based or PHS-based With RBF-FD, easy to explore the intrinsic capabilities of different layouts. Same Code. Hexagonal have a long history, never became mainstream due to implementation complexities.
Comparisons on different node layouts: change 1 line of code Only showing half of domain due to symmetry 800m Ugh!! 400m 200m Comparison: Cartesian: Most unphysical artifacts (`wiggles ), 1 st rotor not formed at 800m Hexagonal: Excellent results Scattered: Little performance penalty but one gains greatly geometric flexibility
Comparisons to other numerical methods Key issue: Data-based initialization of weather prediction models > 500m Below: Comparisons of different numerical methods at 400m resolution Only the RBF-FD calculations shows the beginning of second rotor and can perform at 800m.
Same test problem, but with no physical viscosity 25m resolution (RBF-FD, hex nodes) Details when using different resolutions
3D Elliptic PDE: Modeling Electrical Currents in the Atmosphere Electric Current 3D Node Layout to 8km Thunderstorms (measured data) 8 km Nested spherical shells 8km to 52km 52 km
Sparsity pattern of 3D elliptic operator (99.998% zeros) 3D node layout Nested Shell Nicely banded but GMRES CRASHES Before any node reordering After using reverse Cuthill- McKee Result: Testing with data, 4.2M nodes 100 km. lat. long. By 600m vertical, 31 mins on laptop using GMRES GitHub Open Source Code: Bayona et al., A 3-D RBF-FD solver for modelling the atmospheric Global Electric Circuit with topography (GEC-RBFFD v1.0), Geosci. Model Dev. 2015.
Conclusions Established: - RBF-FD latches onto the physics at much coarser resolutions than other numerical methods, giving higher accuracy and convergence - RBF-FD have shown strong linear scaling on on the latest HPC platforms - Startup cost for modeling with RBF-FD is cheap due to high algorithmic simplicity Some recent review material 1. N. Flyer, G.B. Wright, and B. Fornberg, 2014. Radial basis function-generated finite differences: A mesh-free method for computational geosciences, Handbook of Geomathematics, Springer-Verlag 2. B. Fornberg and N. Flyer, 2015 Solving PDEs with Radial Basis Functions, Acta Numerica. 3. B. Fornberg and N. Flyer, 2015 A Primer on Radial Basis Functions with Applications to the Geosciences, SIAM Press.