Generic finite element capabilities for forest-of-octrees AMR

Size: px

Start display at page:

Download "Generic finite element capabilities for forest-of-octrees AMR"

Barbara McGee
5 years ago
Views:

1 Generic finite element capabilities for forest-of-octrees AMR Carsten Burstedde joint work with Omar Ghattas, Tobin Isaac Institut für Numerische Simulation (INS) Rheinische Friedrich-Wilhelms-Universität Bonn, Germany Institute for Computational Engineering and Sciences (ICES) The University of Texas at Austin, USA Feb 16, 2012 SIAM Conference on Parallel Processing for Scientific Computing Savannah, GA USA

2 What is AMR? Adapt local element size To capture localized features To adjust local resolution/accuracy Whenever required during a simulation

3 What is AMR? Adapt local element size To capture localized features To adjust local resolution/accuracy Whenever required during a simulation!! Distinction: mesh generation (upfront) vs. AMR (dynamic) (Disclaimer: Sometimes terms are used interchangably)

4 What is AMR? local refinement local coarsening dynamic w. G. Stadler, L.C. Wilcox

5 What is AMR? local refinement local coarsening dynamic w. G. Stadler, L.C. Wilcox

6 What is AMR? local refinement local coarsening dynamic w. G. Stadler, L.C. Wilcox

7 Why use AMR? Promised savings Uninteresting regions can be ignored Overall less elements for same accuracy which means less memory and less runtime

8 Why use AMR? Promised savings Uninteresting regions can be ignored Overall less elements for same accuracy which means less memory and less runtime Promised gain Overall higher accuracy with same number of elements which means that models can have more detail Multiscale physics can be resolved

9 Why use AMR? Mantle convection: High resolution for faults and plate boundaries Artist rendering Image by US Geological Survey Simul. (w. M. Gurnis, L. Alisic, CalTech) Surface viscosity (colors), velocity (arrows)

10 Why use AMR? Mantle convection: High resolution for faults and plate boundaries Zoom into the boundary between the Australia/New Hebrides plates (Sim: G. Stadler)

11 Why use AMR? Mantle convection: High resolution for faults and plate boundaries Zoom into the boundary between the Australia/New Hebrides plates (Sim: G. Stadler)

12 How to realize AMR? What is so different from standard finite elements?? Parallel computations rely on load balance Dynamic AMR requires frequent repartitioning Repartitioning is considered a hard problem Yet solving this problem is indispensable

13 Conforming tetrahedral (unstructured) AMR How to realize AMR? mesh data courtesy D. Lazzara, MIT (Re-)partitioning delegated to heuristic graph algorithms

14 Patch-based (overlapping) AMR How to realize AMR? Overlapping patches introduce causality between levels

15 Octree-based (composite) AMR How to realize AMR? Hierarchy is flattened and can be traversed in one pass

16 How to realize AMR? Octree-based AMR Octree maps to cube-like geometry 1:1 relation between octree leaves and mesh elements

17 How to realize AMR? Octree-based AMR Octree maps to cube-like geometry 1:1 relation between octree leaves and mesh elements

18 How to realize AMR? Octree-based AMR Octree maps to cube-like geometry 1:1 relation between octree leaves and mesh elements

19 How to realize AMR? Octree-based AMR Octree maps to cube-like geometry 1:1 relation between octree leaves and mesh elements

20 How to realize AMR? Octree-based AMR Octree maps to cube-like geometry 1:1 relation between octree leaves and mesh elements

21 How to realize AMR? Octree-based AMR Octree maps to cube-like geometry 1:1 relation between octree leaves and mesh elements

22 How to realize AMR? Octree-based AMR Octree maps to cube-like geometry 1:1 relation between octree leaves and mesh elements

23 How to realize AMR? Octree-based AMR Octree maps to cube-like geometry 1:1 relation between octree leaves and mesh elements

24 How to realize AMR? Octree-based AMR Octree maps to cube-like geometry 1:1 relation between octree leaves and mesh elements

25 How to realize AMR? Octree-based AMR Octree maps to cube-like geometry 1:1 relation between octree leaves and mesh elements

26 How to realize AMR? Octree-based AMR Proc 0 Proc 1 Proc 2 Space-filling curve (SFC): Fast parallel partitioning Fast parallel tree algorithms for sorting and searching

27 Synthesis: Forest of octrees From tree... = Partitioning problem solved So far: Only cube-like geometric shapes

28 Synthesis: Forest of octrees...to forest = I Motivation: Geometric flexibility beyond the unit cube I Challenge: Non-matching coordinate systems between octrees

29 p4est forest-of-octrees algorithms Connect SFC through all octrees [1] k 0 k 1 x 0 y 0 k 0 k 1 x 1 p 0 p 1 p 1 p 2 y 1 Minimal global shared storage (metadata) Shared list of octant counts per core (N) p 4 P bytes Shared list of partition markers (k; x, y, z) p 16 P bytes 2D example above (h = 8): markers (0; 0, 0), (0; 6, 4), (1; 0, 4) [1] C. Burstedde, L. C. Wilcox, O. Ghattas (SISC, 2011)

30 p4est forest-of-octrees algorithms p4est core API (for write access ) p4est new: Create a uniformly refined, partitioned forest p4est refine: Refine per-element acc. to 0/1 callbacks p4est coarsen: Coarsen 2 d elements acc. to 0/1 callbacks p4est balance: Establish 2:1 neighbor sizes by add. refines p4est partition: Parallel redistribution acc. to weights p4est random read access (generic / extensible) p4est ghost: Gather one layer of off-processor elements p4est nodes: Gather globally unique node numbering Loop through p4est data structures as needed

31 p4est forest-of-octrees algorithms Weak scalability on ORNL s Jaguar supercomputer Partition Balance Ghost Nodes Percentage of runtime Number of CPU cores Cost of New, Refine, Coarsen, Partition negligible octants; < 10 seconds per million octants per core

32 p4est forest-of-octrees algorithms Weak scalability on ORNL s Jaguar supercomputer Balance Nodes 10 Seconds per (million elements / core) Number of CPU cores Dominant operations: Balance and Nodes scale over 18,360x octants; < 10 seconds per million octants per core

33 p4est forest-of-octrees algorithms p4est is a generic AMR module Rationale: Support diverse numerical approaches Internal state: Element ordering and parallel partition Provide minimal API for mesh modification Connect to numerical discretizations / solvers ( App ) p4est API calls are like MPI collectives (atomic to App) p4est API hides parallel algorithms and communication App p4est: API invokes per-element callbacks App p4est: Access internal state read-only

34 Generic AMR capabilities p4est Enable scalable dynamic AMR What is so different from standard FE?? not much! Use favorite mesh generator to create forest connectivity p4est encapsulates parallel AMR and partitioning Incorporate hanging nodes by additional scatter/gather Physics computation and element matrices are unaffected Matrix-free solvers are unaffected Assembly receives extra couplings

35 Generic AMR capabilities p4est Enable scalable dynamic AMR Convert mesh generator output into forest connectivity Hexahedral mesh generators (e.g. CUBIT): 1-to-1 conversion Tetrahedral mesh generators (e.g. tetgen): 1-to-4 conversion Pick coarsest forest connectivity that respects topology Align to geometry by adaptive octree refinement

36 Generic AMR capabilities p4est Enable scalable dynamic AMR Convert mesh generator output into forest connectivity Output from tetrahedral mesh generator

37 Generic AMR capabilities p4est Enable scalable dynamic AMR Convert mesh generator output into forest connectivity Convert 1 tetrahedron into 4 hexahedra

38 Generic AMR capabilities p4est Enable scalable dynamic AMR Convert mesh generator output into forest connectivity Create discretization on hexahedra as usual

39 Generic AMR capabilities p4est Enable scalable dynamic AMR Convert mesh generator output into forest connectivity Refine recursively as needed

40 Generic AMR capabilities p4est Enable scalable dynamic AMR Convert mesh generator output into forest connectivity Works at scale: Airplane mesh with 270,000 octrees

41 Generic AMR capabilities p4est Enable scalable dynamic AMR Supported Apps so far Rhea (mantle convection, low-order FE, ICES/CalTech) dgea (seismic wave propagation, high-order DG, ICES) Maxwell (electromagnetic scattering, high-order DG, ICES) Ymir (ice sheet dynamics, high-order SE, ICES) deal.ii (generic FE discretization middleware, Texas A&M) ClawPack (hyperbolic conservation laws; work in progress)

42 Generic AMR capabilities p4est Enable scalable dynamic AMR Supported Apps so far Rhea (mantle convection, low-order FE, ICES/CalTech) dgea (seismic wave propagation, high-order DG, ICES) Maxwell (electromagnetic scattering, high-order DG, ICES) Ymir (ice sheet dynamics, high-order SE, ICES) deal.ii (generic FE discretization middleware, Texas A&M) ClawPack (hyperbolic conservation laws; work in progress)...? (put your App here p4est is free software!)

43 Acknowledgements Publications Homepage: Funding NSF DMS, OCI PetaApps, OPP CDI DOE SciDAC TOPS, SC AFOSR HPC Resources Texas Advanced Computing Center (TACC) National Center for Computational Science (NCCS)

Forest-of-octrees AMR: algorithms and interfaces

Forest-of-octrees AMR: algorithms and interfaces Carsten Burstedde joint work with Omar Ghattas, Tobin Isaac, Georg Stadler, Lucas C. Wilcox Institut für Numerische Simulation (INS) Rheinische Friedrich-Wilhelms-Universität