Generic finite element capabilities for forest-of-octrees AMR Carsten Burstedde joint work with Omar Ghattas, Tobin Isaac Institut für Numerische Simulation (INS) Rheinische Friedrich-Wilhelms-Universität Bonn, Germany Institute for Computational Engineering and Sciences (ICES) The University of Texas at Austin, USA Feb 16, 2012 SIAM Conference on Parallel Processing for Scientific Computing Savannah, GA USA
What is AMR? Adapt local element size To capture localized features To adjust local resolution/accuracy Whenever required during a simulation
What is AMR? Adapt local element size To capture localized features To adjust local resolution/accuracy Whenever required during a simulation!! Distinction: mesh generation (upfront) vs. AMR (dynamic) (Disclaimer: Sometimes terms are used interchangably)
What is AMR? local refinement local coarsening dynamic w. G. Stadler, L.C. Wilcox
What is AMR? local refinement local coarsening dynamic w. G. Stadler, L.C. Wilcox
What is AMR? local refinement local coarsening dynamic w. G. Stadler, L.C. Wilcox
Why use AMR? Promised savings Uninteresting regions can be ignored Overall less elements for same accuracy which means less memory and less runtime
Why use AMR? Promised savings Uninteresting regions can be ignored Overall less elements for same accuracy which means less memory and less runtime Promised gain Overall higher accuracy with same number of elements which means that models can have more detail Multiscale physics can be resolved
Why use AMR? Mantle convection: High resolution for faults and plate boundaries Artist rendering Image by US Geological Survey Simul. (w. M. Gurnis, L. Alisic, CalTech) Surface viscosity (colors), velocity (arrows)
Why use AMR? Mantle convection: High resolution for faults and plate boundaries Zoom into the boundary between the Australia/New Hebrides plates (Sim: G. Stadler)
Why use AMR? Mantle convection: High resolution for faults and plate boundaries Zoom into the boundary between the Australia/New Hebrides plates (Sim: G. Stadler)
How to realize AMR? What is so different from standard finite elements?? Parallel computations rely on load balance Dynamic AMR requires frequent repartitioning Repartitioning is considered a hard problem Yet solving this problem is indispensable
Conforming tetrahedral (unstructured) AMR How to realize AMR? mesh data courtesy D. Lazzara, MIT (Re-)partitioning delegated to heuristic graph algorithms
Patch-based (overlapping) AMR How to realize AMR? Overlapping patches introduce causality between levels
Octree-based (composite) AMR How to realize AMR? Hierarchy is flattened and can be traversed in one pass
How to realize AMR? Octree-based AMR Octree maps to cube-like geometry 1:1 relation between octree leaves and mesh elements
How to realize AMR? Octree-based AMR Octree maps to cube-like geometry 1:1 relation between octree leaves and mesh elements
How to realize AMR? Octree-based AMR Octree maps to cube-like geometry 1:1 relation between octree leaves and mesh elements
How to realize AMR? Octree-based AMR Octree maps to cube-like geometry 1:1 relation between octree leaves and mesh elements
How to realize AMR? Octree-based AMR Octree maps to cube-like geometry 1:1 relation between octree leaves and mesh elements
How to realize AMR? Octree-based AMR Octree maps to cube-like geometry 1:1 relation between octree leaves and mesh elements
How to realize AMR? Octree-based AMR Octree maps to cube-like geometry 1:1 relation between octree leaves and mesh elements
How to realize AMR? Octree-based AMR Octree maps to cube-like geometry 1:1 relation between octree leaves and mesh elements
How to realize AMR? Octree-based AMR Octree maps to cube-like geometry 1:1 relation between octree leaves and mesh elements
How to realize AMR? Octree-based AMR Octree maps to cube-like geometry 1:1 relation between octree leaves and mesh elements
How to realize AMR? Octree-based AMR Proc 0 Proc 1 Proc 2 Space-filling curve (SFC): Fast parallel partitioning Fast parallel tree algorithms for sorting and searching
Synthesis: Forest of octrees From tree... = Partitioning problem solved So far: Only cube-like geometric shapes
Synthesis: Forest of octrees...to forest = I Motivation: Geometric flexibility beyond the unit cube I Challenge: Non-matching coordinate systems between octrees
p4est forest-of-octrees algorithms Connect SFC through all octrees [1] k 0 k 1 x 0 y 0 k 0 k 1 x 1 p 0 p 1 p 1 p 2 y 1 Minimal global shared storage (metadata) Shared list of octant counts per core (N) p 4 P bytes Shared list of partition markers (k; x, y, z) p 16 P bytes 2D example above (h = 8): markers (0; 0, 0), (0; 6, 4), (1; 0, 4) [1] C. Burstedde, L. C. Wilcox, O. Ghattas (SISC, 2011)
p4est forest-of-octrees algorithms p4est core API (for write access ) p4est new: Create a uniformly refined, partitioned forest p4est refine: Refine per-element acc. to 0/1 callbacks p4est coarsen: Coarsen 2 d elements acc. to 0/1 callbacks p4est balance: Establish 2:1 neighbor sizes by add. refines p4est partition: Parallel redistribution acc. to weights p4est random read access (generic / extensible) p4est ghost: Gather one layer of off-processor elements p4est nodes: Gather globally unique node numbering Loop through p4est data structures as needed
p4est forest-of-octrees algorithms Weak scalability on ORNL s Jaguar supercomputer Partition Balance Ghost Nodes 100 90 80 Percentage of runtime 70 60 50 40 30 20 10 0 12 60 432 3444 27540 220320 Number of CPU cores Cost of New, Refine, Coarsen, Partition negligible 5.13 10 11 octants; < 10 seconds per million octants per core
p4est forest-of-octrees algorithms Weak scalability on ORNL s Jaguar supercomputer Balance Nodes 10 Seconds per (million elements / core) 8 6 4 2 0 12 60 432 3444 27540 220320 Number of CPU cores Dominant operations: Balance and Nodes scale over 18,360x 5.13 10 11 octants; < 10 seconds per million octants per core
p4est forest-of-octrees algorithms p4est is a generic AMR module Rationale: Support diverse numerical approaches Internal state: Element ordering and parallel partition Provide minimal API for mesh modification Connect to numerical discretizations / solvers ( App ) p4est API calls are like MPI collectives (atomic to App) p4est API hides parallel algorithms and communication App p4est: API invokes per-element callbacks App p4est: Access internal state read-only
Generic AMR capabilities p4est Enable scalable dynamic AMR What is so different from standard FE?? not much! Use favorite mesh generator to create forest connectivity p4est encapsulates parallel AMR and partitioning Incorporate hanging nodes by additional scatter/gather Physics computation and element matrices are unaffected Matrix-free solvers are unaffected Assembly receives extra couplings
Generic AMR capabilities p4est Enable scalable dynamic AMR Convert mesh generator output into forest connectivity Hexahedral mesh generators (e.g. CUBIT): 1-to-1 conversion Tetrahedral mesh generators (e.g. tetgen): 1-to-4 conversion Pick coarsest forest connectivity that respects topology Align to geometry by adaptive octree refinement
Generic AMR capabilities p4est Enable scalable dynamic AMR Convert mesh generator output into forest connectivity Output from tetrahedral mesh generator
Generic AMR capabilities p4est Enable scalable dynamic AMR Convert mesh generator output into forest connectivity Convert 1 tetrahedron into 4 hexahedra
Generic AMR capabilities p4est Enable scalable dynamic AMR Convert mesh generator output into forest connectivity Create discretization on hexahedra as usual
Generic AMR capabilities p4est Enable scalable dynamic AMR Convert mesh generator output into forest connectivity Refine recursively as needed
Generic AMR capabilities p4est Enable scalable dynamic AMR Convert mesh generator output into forest connectivity Works at scale: Airplane mesh with 270,000 octrees
Generic AMR capabilities p4est Enable scalable dynamic AMR Supported Apps so far Rhea (mantle convection, low-order FE, ICES/CalTech) dgea (seismic wave propagation, high-order DG, ICES) Maxwell (electromagnetic scattering, high-order DG, ICES) Ymir (ice sheet dynamics, high-order SE, ICES) deal.ii (generic FE discretization middleware, Texas A&M) ClawPack (hyperbolic conservation laws; work in progress)
Generic AMR capabilities p4est Enable scalable dynamic AMR Supported Apps so far Rhea (mantle convection, low-order FE, ICES/CalTech) dgea (seismic wave propagation, high-order DG, ICES) Maxwell (electromagnetic scattering, high-order DG, ICES) Ymir (ice sheet dynamics, high-order SE, ICES) deal.ii (generic FE discretization middleware, Texas A&M) ClawPack (hyperbolic conservation laws; work in progress)...? (put your App here p4est is free software!)
Acknowledgements Publications Homepage: http://burstedde.ins.uni-bonn.de/ Funding NSF DMS, OCI PetaApps, OPP CDI DOE SciDAC TOPS, SC AFOSR HPC Resources Texas Advanced Computing Center (TACC) National Center for Computational Science (NCCS)