Generic finite element capabilities for forest-of-octrees AMR

Generic finite element capabilities for forest-of-octrees AMR Carsten Burstedde joint work with Omar Ghattas, Tobin Isaac Institut für Numerische Simulation (INS) Rheinische Friedrich-Wilhelms-Universität Bonn, Germany Institute for Computational Engineering and Sciences (ICES) The University of Texas at Austin, USA Feb 16, 2012 SIAM Conference on Parallel Processing for Scientific Computing Savannah, GA USA

What is AMR? Adapt local element size To capture localized features To adjust local resolution/accuracy Whenever required during a simulation

What is AMR? Adapt local element size To capture localized features To adjust local resolution/accuracy Whenever required during a simulation!! Distinction: mesh generation (upfront) vs. AMR (dynamic) (Disclaimer: Sometimes terms are used interchangably)

What is AMR? local refinement local coarsening dynamic w. G. Stadler, L.C. Wilcox

Why use AMR? Promised savings Uninteresting regions can be ignored Overall less elements for same accuracy which means less memory and less runtime

Why use AMR? Promised savings Uninteresting regions can be ignored Overall less elements for same accuracy which means less memory and less runtime Promised gain Overall higher accuracy with same number of elements which means that models can have more detail Multiscale physics can be resolved

Why use AMR? Mantle convection: High resolution for faults and plate boundaries Artist rendering Image by US Geological Survey Simul. (w. M. Gurnis, L. Alisic, CalTech) Surface viscosity (colors), velocity (arrows)

Why use AMR? Mantle convection: High resolution for faults and plate boundaries Zoom into the boundary between the Australia/New Hebrides plates (Sim: G. Stadler)

How to realize AMR? What is so different from standard finite elements?? Parallel computations rely on load balance Dynamic AMR requires frequent repartitioning Repartitioning is considered a hard problem Yet solving this problem is indispensable

Conforming tetrahedral (unstructured) AMR How to realize AMR? mesh data courtesy D. Lazzara, MIT (Re-)partitioning delegated to heuristic graph algorithms

Patch-based (overlapping) AMR How to realize AMR? Overlapping patches introduce causality between levels

Octree-based (composite) AMR How to realize AMR? Hierarchy is flattened and can be traversed in one pass

How to realize AMR? Octree-based AMR Octree maps to cube-like geometry 1:1 relation between octree leaves and mesh elements

How to realize AMR? Octree-based AMR Proc 0 Proc 1 Proc 2 Space-filling curve (SFC): Fast parallel partitioning Fast parallel tree algorithms for sorting and searching

Synthesis: Forest of octrees From tree... = Partitioning problem solved So far: Only cube-like geometric shapes

Synthesis: Forest of octrees...to forest = I Motivation: Geometric flexibility beyond the unit cube I Challenge: Non-matching coordinate systems between octrees

p4est forest-of-octrees algorithms Connect SFC through all octrees [1] k 0 k 1 x 0 y 0 k 0 k 1 x 1 p 0 p 1 p 1 p 2 y 1 Minimal global shared storage (metadata) Shared list of octant counts per core (N) p 4 P bytes Shared list of partition markers (k; x, y, z) p 16 P bytes 2D example above (h = 8): markers (0; 0, 0), (0; 6, 4), (1; 0, 4) [1] C. Burstedde, L. C. Wilcox, O. Ghattas (SISC, 2011)

p4est forest-of-octrees algorithms p4est core API (for write access ) p4est new: Create a uniformly refined, partitioned forest p4est refine: Refine per-element acc. to 0/1 callbacks p4est coarsen: Coarsen 2 d elements acc. to 0/1 callbacks p4est balance: Establish 2:1 neighbor sizes by add. refines p4est partition: Parallel redistribution acc. to weights p4est random read access (generic / extensible) p4est ghost: Gather one layer of off-processor elements p4est nodes: Gather globally unique node numbering Loop through p4est data structures as needed

p4est forest-of-octrees algorithms Weak scalability on ORNL s Jaguar supercomputer Partition Balance Ghost Nodes 100 90 80 Percentage of runtime 70 60 50 40 30 20 10 0 12 60 432 3444 27540 220320 Number of CPU cores Cost of New, Refine, Coarsen, Partition negligible 5.13 10 11 octants; < 10 seconds per million octants per core

p4est forest-of-octrees algorithms Weak scalability on ORNL s Jaguar supercomputer Balance Nodes 10 Seconds per (million elements / core) 8 6 4 2 0 12 60 432 3444 27540 220320 Number of CPU cores Dominant operations: Balance and Nodes scale over 18,360x 5.13 10 11 octants; < 10 seconds per million octants per core

p4est forest-of-octrees algorithms p4est is a generic AMR module Rationale: Support diverse numerical approaches Internal state: Element ordering and parallel partition Provide minimal API for mesh modification Connect to numerical discretizations / solvers ( App ) p4est API calls are like MPI collectives (atomic to App) p4est API hides parallel algorithms and communication App p4est: API invokes per-element callbacks App p4est: Access internal state read-only

Generic AMR capabilities p4est Enable scalable dynamic AMR What is so different from standard FE?? not much! Use favorite mesh generator to create forest connectivity p4est encapsulates parallel AMR and partitioning Incorporate hanging nodes by additional scatter/gather Physics computation and element matrices are unaffected Matrix-free solvers are unaffected Assembly receives extra couplings

Generic AMR capabilities p4est Enable scalable dynamic AMR Convert mesh generator output into forest connectivity Hexahedral mesh generators (e.g. CUBIT): 1-to-1 conversion Tetrahedral mesh generators (e.g. tetgen): 1-to-4 conversion Pick coarsest forest connectivity that respects topology Align to geometry by adaptive octree refinement

Generic AMR capabilities p4est Enable scalable dynamic AMR Convert mesh generator output into forest connectivity Output from tetrahedral mesh generator

Generic AMR capabilities p4est Enable scalable dynamic AMR Convert mesh generator output into forest connectivity Convert 1 tetrahedron into 4 hexahedra

Generic AMR capabilities p4est Enable scalable dynamic AMR Convert mesh generator output into forest connectivity Create discretization on hexahedra as usual

Generic AMR capabilities p4est Enable scalable dynamic AMR Convert mesh generator output into forest connectivity Refine recursively as needed

Generic AMR capabilities p4est Enable scalable dynamic AMR Convert mesh generator output into forest connectivity Works at scale: Airplane mesh with 270,000 octrees

Generic AMR capabilities p4est Enable scalable dynamic AMR Supported Apps so far Rhea (mantle convection, low-order FE, ICES/CalTech) dgea (seismic wave propagation, high-order DG, ICES) Maxwell (electromagnetic scattering, high-order DG, ICES) Ymir (ice sheet dynamics, high-order SE, ICES) deal.ii (generic FE discretization middleware, Texas A&M) ClawPack (hyperbolic conservation laws; work in progress)

Acknowledgements Publications Homepage: http://burstedde.ins.uni-bonn.de/ Funding NSF DMS, OCI PetaApps, OPP CDI DOE SciDAC TOPS, SC AFOSR HPC Resources Texas Advanced Computing Center (TACC) National Center for Computational Science (NCCS)