Generic finite element capabilities for forest-of-octrees AMR

Similar documents
Forest-of-octrees AMR: algorithms and interfaces

Parallel adaptive mesh refinement using multiple octrees and the p4est software

Adaptive Mesh Refinement (AMR)

Massively Parallel Finite Element Simulations with deal.ii

Parallel High-Order Geometric Multigrid Methods on Adaptive Meshes for Highly Heterogeneous Nonlinear Stokes Flow Simulations of Earth s Mantle

p4est: SCALABLE ALGORITHMS FOR PARALLEL ADAPTIVE MESH REFINEMENT ON FORESTS OF OCTREES

Multilevel Methods for Forward and Inverse Ice Sheet Modeling

Recursive algorithms for distributed forests of octrees

arxiv: v3 [cs.dc] 20 Aug 2015

UNITING PERFORMANCE AND EXTENSIBILITY IN ADAPTIVE FINITE ELEMENT COMPUTATIONS

Parallel Computations

Donna Calhoun. Boise State University

ALPS: A framework for parallel adaptive PDE solution

Peta-Scale Simulations with the HPC Software Framework walberla:

Parallel, adaptive framework for mapped, multi-block domains

HPC Algorithms and Applications

Low-Cost Parallel Algorithms for 2:1 Octree Balance

ForestClaw : Mapped, multiblock adaptive quadtrees

Parallel Unstructured Mesh Generation by an Advancing Front Method

Scalability of Uintah Past Present and Future

Predictive Engineering and Computational Sciences. Data Structures and Methods for Unstructured Distributed Meshes. Roy H. Stogner

Fast Dynamic Load Balancing for Extreme Scale Systems

H5FED HDF5 Based Finite Element Data Storage

Preliminary Experiences with the Uintah Framework on on Intel Xeon Phi and Stampede

Dendro: Parallel algorithms for multigrid and AMR methods on 2:1 balanced octrees

Towards a Reconfigurable HPC Component Model

Towards Adaptive Mesh PDE Simulations on Petascale Computers

arxiv: v2 [cs.ms] 2 Oct 2017

Presented by: Terry L. Wilmarth

Overview of the High-Order ADER-DG Method for Numerical Seismology

deal.ii: a numerical library to tackle realistic challenges from industry and academia! Luca Heltai SISSA International School for Advanced Studies!

Towards Large Scale Predictive Flow Simulations

simulation framework for piecewise regular grids

Parallel Adaptive Tsunami Modelling with Triangular Discontinuous Galerkin Schemes

Parallel Algorithms: Adaptive Mesh Refinement (AMR) method and its implementation

Outline. follows the structure of the report

RECURSIVE ALGORITHMS FOR DISTRIBUTED FORESTS OF OCTREES

Sustainability and Efficiency for Simulation Software in the Exascale Era

Enzo-P / Cello. Formation of the First Galaxies. San Diego Supercomputer Center. Department of Physics and Astronomy

Enzo-P / Cello. Scalable Adaptive Mesh Refinement for Astrophysics and Cosmology. San Diego Supercomputer Center. Department of Physics and Astronomy

Asynchronous OpenCL/MPI numerical simulations of conservation laws

A Scalable Adaptive Mesh Refinement Framework For Parallel Astrophysics Applications

Structured Grid Generation for Turbo Machinery Applications using Topology Templates

Parallel FEM Computation and Multilevel Graph Partitioning Xing Cai

HPC Computer Aided CINECA

Algorithms and Data Structures for Massively Parallel Generic Adaptive Finite Element Codes

Performance Optimization of a Massively Parallel Phase-Field Method Using the HPC Framework walberla

Radial Basis Function-Generated Finite Differences (RBF-FD): New Opportunities for Applications in Scientific Computing

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)

Level Set Methods and Fast Marching Methods

Formation of the First Galaxies Enzo-P / Cello Adaptive Mesh Refinement

Simulation in Computer Graphics Space Subdivision. Matthias Teschner

Scalable Dynamic Adaptive Simulations with ParFUM

Unstructured Grid Numbering Schemes for GPU Coalescing Requirements

Graph Partitioning for High-Performance Scientific Simulations. Advanced Topics Spring 2008 Prof. Robert van Engelen

Introduction to parallel Computing

The ITAPS Mesh Interface

Partitioning and Partitioning Tools. Tim Barth NASA Ames Research Center Moffett Field, California USA

The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System

Hierarchical Hybrid Grids

Collision Detection based on Spatial Partitioning

Automatic & Robust Meshing in Fluids 2011 ANSYS Regional Conferences

Tools and Primitives for High Performance Graph Computation

SOFTWARE CONCEPTS AND ALGORITHMS FOR AN EFFICIENT AND SCALABLE PARALLEL FINITE ELEMENT METHOD

Memory Hierarchy Management for Iterative Graph Structures

Dynamic Load Partitioning Strategies for Managing Data of Space and Time Heterogeneity in Parallel SAMR Applications

13.472J/1.128J/2.158J/16.940J COMPUTATIONAL GEOMETRY

HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES. Cliff Woolley, NVIDIA

Efficient Global Element Indexing for Parallel Adaptive Flow Solvers

Accelerated Earthquake Simulations

Joint Advanced Student School 2007 Martin Dummer

Low-constant Parallel Algorithms for Finite Element Simulations using Linear Octrees

LARGE-SCALE NORTHRIDGE EARTHQUAKE SIMULATION USING OCTREE-BASED MULTIRESOLUTION MESH METHOD

Effective adaptation of hexahedral mesh using local refinement and error estimation

τ-extrapolation on 3D semi-structured finite element meshes

Solving Partial Differential Equations on Overlapping Grids

Towards a Unified Framework for Scientific Computing

Multigrid Pattern. I. Problem. II. Driving Forces. III. Solution

Exploring unstructured Poisson solvers for FDS

Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs

Automatic & Robust Meshing in Fluids 2011 ANSYS Regional Conferences

Introduction to Multigrid and its Parallelization

Using Sonnet in a Cadence Virtuoso Design Flow

A Framework for Online Inversion-Based 3D Site Characterization

Introduction to Finite Element Modelling in Geosciences

Scene Management. Video Game Technologies 11498: MSc in Computer Science and Engineering 11156: MSc in Game Design and Development

Computational Fluid Dynamics with the Lattice Boltzmann Method KTH SCI, Stockholm

Enabling In Situ Viz and Data Analysis with Provenance in libmesh

A Scalable GPU-Based Compressible Fluid Flow Solver for Unstructured Grids

Implicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation using OpenACC

Parallel Mesh Partitioning in Alya

A PARALLEL GEOMETRIC MULTIGRID METHOD FOR FINITE ELEMENTS ON OCTREE MESHES

Massively Parallel Phase Field Simulations using HPC Framework walberla

Dendro: Parallel algorithms for multigrid and AMR methods on 2:1 balanced octrees

From Physical Modeling to Scientific Understanding An End-to-End Approach to Parallel Supercomputing

Principles of Parallel Algorithm Design: Concurrency and Mapping

High-Performance Computational Electromagnetic Modeling Using Low-Cost Parallel Computers

arxiv: v1 [cs.ms] 8 Aug 2018

Load Balancing and Data Migration in a Hybrid Computational Fluid Dynamics Application

State of the Art in Electromagnetic Modeling for the Compact Linear Collider

Transcription:

Generic finite element capabilities for forest-of-octrees AMR Carsten Burstedde joint work with Omar Ghattas, Tobin Isaac Institut für Numerische Simulation (INS) Rheinische Friedrich-Wilhelms-Universität Bonn, Germany Institute for Computational Engineering and Sciences (ICES) The University of Texas at Austin, USA Feb 16, 2012 SIAM Conference on Parallel Processing for Scientific Computing Savannah, GA USA

What is AMR? Adapt local element size To capture localized features To adjust local resolution/accuracy Whenever required during a simulation

What is AMR? Adapt local element size To capture localized features To adjust local resolution/accuracy Whenever required during a simulation!! Distinction: mesh generation (upfront) vs. AMR (dynamic) (Disclaimer: Sometimes terms are used interchangably)

What is AMR? local refinement local coarsening dynamic w. G. Stadler, L.C. Wilcox

What is AMR? local refinement local coarsening dynamic w. G. Stadler, L.C. Wilcox

What is AMR? local refinement local coarsening dynamic w. G. Stadler, L.C. Wilcox

Why use AMR? Promised savings Uninteresting regions can be ignored Overall less elements for same accuracy which means less memory and less runtime

Why use AMR? Promised savings Uninteresting regions can be ignored Overall less elements for same accuracy which means less memory and less runtime Promised gain Overall higher accuracy with same number of elements which means that models can have more detail Multiscale physics can be resolved

Why use AMR? Mantle convection: High resolution for faults and plate boundaries Artist rendering Image by US Geological Survey Simul. (w. M. Gurnis, L. Alisic, CalTech) Surface viscosity (colors), velocity (arrows)

Why use AMR? Mantle convection: High resolution for faults and plate boundaries Zoom into the boundary between the Australia/New Hebrides plates (Sim: G. Stadler)

Why use AMR? Mantle convection: High resolution for faults and plate boundaries Zoom into the boundary between the Australia/New Hebrides plates (Sim: G. Stadler)

How to realize AMR? What is so different from standard finite elements?? Parallel computations rely on load balance Dynamic AMR requires frequent repartitioning Repartitioning is considered a hard problem Yet solving this problem is indispensable

Conforming tetrahedral (unstructured) AMR How to realize AMR? mesh data courtesy D. Lazzara, MIT (Re-)partitioning delegated to heuristic graph algorithms

Patch-based (overlapping) AMR How to realize AMR? Overlapping patches introduce causality between levels

Octree-based (composite) AMR How to realize AMR? Hierarchy is flattened and can be traversed in one pass

How to realize AMR? Octree-based AMR Octree maps to cube-like geometry 1:1 relation between octree leaves and mesh elements

How to realize AMR? Octree-based AMR Octree maps to cube-like geometry 1:1 relation between octree leaves and mesh elements

How to realize AMR? Octree-based AMR Octree maps to cube-like geometry 1:1 relation between octree leaves and mesh elements

How to realize AMR? Octree-based AMR Octree maps to cube-like geometry 1:1 relation between octree leaves and mesh elements

How to realize AMR? Octree-based AMR Octree maps to cube-like geometry 1:1 relation between octree leaves and mesh elements

How to realize AMR? Octree-based AMR Octree maps to cube-like geometry 1:1 relation between octree leaves and mesh elements

How to realize AMR? Octree-based AMR Octree maps to cube-like geometry 1:1 relation between octree leaves and mesh elements

How to realize AMR? Octree-based AMR Octree maps to cube-like geometry 1:1 relation between octree leaves and mesh elements

How to realize AMR? Octree-based AMR Octree maps to cube-like geometry 1:1 relation between octree leaves and mesh elements

How to realize AMR? Octree-based AMR Octree maps to cube-like geometry 1:1 relation between octree leaves and mesh elements

How to realize AMR? Octree-based AMR Proc 0 Proc 1 Proc 2 Space-filling curve (SFC): Fast parallel partitioning Fast parallel tree algorithms for sorting and searching

Synthesis: Forest of octrees From tree... = Partitioning problem solved So far: Only cube-like geometric shapes

Synthesis: Forest of octrees...to forest = I Motivation: Geometric flexibility beyond the unit cube I Challenge: Non-matching coordinate systems between octrees

p4est forest-of-octrees algorithms Connect SFC through all octrees [1] k 0 k 1 x 0 y 0 k 0 k 1 x 1 p 0 p 1 p 1 p 2 y 1 Minimal global shared storage (metadata) Shared list of octant counts per core (N) p 4 P bytes Shared list of partition markers (k; x, y, z) p 16 P bytes 2D example above (h = 8): markers (0; 0, 0), (0; 6, 4), (1; 0, 4) [1] C. Burstedde, L. C. Wilcox, O. Ghattas (SISC, 2011)

p4est forest-of-octrees algorithms p4est core API (for write access ) p4est new: Create a uniformly refined, partitioned forest p4est refine: Refine per-element acc. to 0/1 callbacks p4est coarsen: Coarsen 2 d elements acc. to 0/1 callbacks p4est balance: Establish 2:1 neighbor sizes by add. refines p4est partition: Parallel redistribution acc. to weights p4est random read access (generic / extensible) p4est ghost: Gather one layer of off-processor elements p4est nodes: Gather globally unique node numbering Loop through p4est data structures as needed

p4est forest-of-octrees algorithms Weak scalability on ORNL s Jaguar supercomputer Partition Balance Ghost Nodes 100 90 80 Percentage of runtime 70 60 50 40 30 20 10 0 12 60 432 3444 27540 220320 Number of CPU cores Cost of New, Refine, Coarsen, Partition negligible 5.13 10 11 octants; < 10 seconds per million octants per core

p4est forest-of-octrees algorithms Weak scalability on ORNL s Jaguar supercomputer Balance Nodes 10 Seconds per (million elements / core) 8 6 4 2 0 12 60 432 3444 27540 220320 Number of CPU cores Dominant operations: Balance and Nodes scale over 18,360x 5.13 10 11 octants; < 10 seconds per million octants per core

p4est forest-of-octrees algorithms p4est is a generic AMR module Rationale: Support diverse numerical approaches Internal state: Element ordering and parallel partition Provide minimal API for mesh modification Connect to numerical discretizations / solvers ( App ) p4est API calls are like MPI collectives (atomic to App) p4est API hides parallel algorithms and communication App p4est: API invokes per-element callbacks App p4est: Access internal state read-only

Generic AMR capabilities p4est Enable scalable dynamic AMR What is so different from standard FE?? not much! Use favorite mesh generator to create forest connectivity p4est encapsulates parallel AMR and partitioning Incorporate hanging nodes by additional scatter/gather Physics computation and element matrices are unaffected Matrix-free solvers are unaffected Assembly receives extra couplings

Generic AMR capabilities p4est Enable scalable dynamic AMR Convert mesh generator output into forest connectivity Hexahedral mesh generators (e.g. CUBIT): 1-to-1 conversion Tetrahedral mesh generators (e.g. tetgen): 1-to-4 conversion Pick coarsest forest connectivity that respects topology Align to geometry by adaptive octree refinement

Generic AMR capabilities p4est Enable scalable dynamic AMR Convert mesh generator output into forest connectivity Output from tetrahedral mesh generator

Generic AMR capabilities p4est Enable scalable dynamic AMR Convert mesh generator output into forest connectivity Convert 1 tetrahedron into 4 hexahedra

Generic AMR capabilities p4est Enable scalable dynamic AMR Convert mesh generator output into forest connectivity Create discretization on hexahedra as usual

Generic AMR capabilities p4est Enable scalable dynamic AMR Convert mesh generator output into forest connectivity Refine recursively as needed

Generic AMR capabilities p4est Enable scalable dynamic AMR Convert mesh generator output into forest connectivity Works at scale: Airplane mesh with 270,000 octrees

Generic AMR capabilities p4est Enable scalable dynamic AMR Supported Apps so far Rhea (mantle convection, low-order FE, ICES/CalTech) dgea (seismic wave propagation, high-order DG, ICES) Maxwell (electromagnetic scattering, high-order DG, ICES) Ymir (ice sheet dynamics, high-order SE, ICES) deal.ii (generic FE discretization middleware, Texas A&M) ClawPack (hyperbolic conservation laws; work in progress)

Generic AMR capabilities p4est Enable scalable dynamic AMR Supported Apps so far Rhea (mantle convection, low-order FE, ICES/CalTech) dgea (seismic wave propagation, high-order DG, ICES) Maxwell (electromagnetic scattering, high-order DG, ICES) Ymir (ice sheet dynamics, high-order SE, ICES) deal.ii (generic FE discretization middleware, Texas A&M) ClawPack (hyperbolic conservation laws; work in progress)...? (put your App here p4est is free software!)

Acknowledgements Publications Homepage: http://burstedde.ins.uni-bonn.de/ Funding NSF DMS, OCI PetaApps, OPP CDI DOE SciDAC TOPS, SC AFOSR HPC Resources Texas Advanced Computing Center (TACC) National Center for Computational Science (NCCS)