Hierarchical Hybrid Grids

Similar documents
τ-extrapolation on 3D semi-structured finite element meshes

Large scale Imaging on Current Many- Core Platforms

Automatic Generation of Algorithms and Data Structures for Geometric Multigrid. Harald Köstler, Sebastian Kuckuk Siam Parallel Processing 02/21/2014

Finite Volume Discretization on Irregular Voronoi Grids

Multigrid Pattern. I. Problem. II. Driving Forces. III. Solution

Contents. I The Basic Framework for Stationary Problems 1

Efficient Imaging Algorithms on Many-Core Platforms

Parallel High-Order Geometric Multigrid Methods on Adaptive Meshes for Highly Heterogeneous Nonlinear Stokes Flow Simulations of Earth s Mantle

GEOMETRY MODELING & GRID GENERATION

ETNA Kent State University

Modeling External Compressible Flow

3-D Wind Field Simulation over Complex Terrain

smooth coefficients H. Köstler, U. Rüde

PhD Student. Associate Professor, Co-Director, Center for Computational Earth and Environmental Science. Abdulrahman Manea.

Application of Finite Volume Method for Structural Analysis

How to Optimize Geometric Multigrid Methods on GPUs

SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND

HPC Algorithms and Applications

05 - Surfaces. Acknowledgements: Olga Sorkine-Hornung. CSCI-GA Geometric Modeling - Daniele Panozzo

The 3D DSC in Fluid Simulation

Massively Parallel Phase Field Simulations using HPC Framework walberla

T6: Position-Based Simulation Methods in Computer Graphics. Jan Bender Miles Macklin Matthias Müller

Introduction to Multigrid and its Parallelization

An Interface-fitted Mesh Generator and Polytopal Element Methods for Elliptic Interface Problems

1.2 Numerical Solutions of Flow Problems

Recent developments for the multigrid scheme of the DLR TAU-Code

From Hyperbolic Diffusion Scheme to Gradient Method: Implicit Green-Gauss Gradients for Unstructured Grids

Lecture 15: More Iterative Ideas

Direct Numerical Simulation of Turbulent Boundary Layers at High Reynolds Numbers.

A Novel Approach to High Speed Collision

PROGRAMMING OF MULTIGRID METHODS

TAU mesh deformation. Thomas Gerhold

Shape Modeling and Geometry Processing

Tiling Three-Dimensional Space with Simplices. Shankar Krishnan AT&T Labs - Research

Modeling Skills Thermal Analysis J.E. Akin, Rice University

The Immersed Interface Method

This is an author-deposited version published in: Eprints ID: 4362

Why Use the GPU? How to Exploit? New Hardware Features. Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid. Semiconductor trends

Scalability of Elliptic Solvers in NWP. Weather and Climate- Prediction

Lab 9: FLUENT: Transient Natural Convection Between Concentric Cylinders

f xx + f yy = F (x, y)

On a nested refinement of anisotropic tetrahedral grids under Hessian metrics

Benchmark 3D: the VAG scheme

Reporting Mesh Statistics

simulation framework for piecewise regular grids

Data mining with sparse grids

Example 24 Spring-back

SNAC: a tutorial. Eunseo Choi. July 29, Lamont-Doherty Earth Observatory

Sustainability and Efficiency for Simulation Software in the Exascale Era

PATCH TEST OF HEXAHEDRAL ELEMENT

Parallel Adaptive Tsunami Modelling with Triangular Discontinuous Galerkin Schemes

Scientific Visualization Example exam questions with commented answers

Curved Mesh Generation and Mesh Refinement using Lagrangian Solid Mechanics

Finite Volume Methodology for Contact Problems of Linear Elastic Solids

What is visualization? Why is it important?

Adaptive numerical methods

An introduction to mesh generation Part IV : elliptic meshing

On the high order FV schemes for compressible flows

Massively Parallel Finite Element Simulations with deal.ii

Best Practices: Electronics Cooling. Ruben Bons - CD-adapco

Metafor FE Software. 2. Operator split. 4. Rezoning methods 5. Contact with friction

High Order Nédélec Elements with local complete sequence properties

GEOMETRIC MULTIGRID METHODS ON STRUCTURED TRIANGULAR GRIDS FOR INCOMPRESSIBLE NAVIER-STOKES EQUATIONS AT LOW REYNOLDS NUMBERS

Parallel Mesh Multiplication for Code_Saturne

An Efficient, Geometric Multigrid Solver for the Anisotropic Diffusion Equation in Two and Three Dimensions

Computational Models for the Analysis of positive displacement machines: Real Gas and Dynamic Mesh

arxiv: v1 [math.na] 20 Sep 2016

Backward facing step Homework. Department of Fluid Mechanics. For Personal Use. Budapest University of Technology and Economics. Budapest, 2010 autumn

Tutorial 1. Introduction to Using FLUENT: Fluid Flow and Heat Transfer in a Mixing Elbow

Geometric Multigrid on Multicore Architectures: Performance-Optimized Complex Diffusion

Highly Parallel Multigrid Solvers for Multicore and Manycore Processors

Coupled Analysis of FSI

Multigrid Algorithms for Three-Dimensional RANS Calculations - The SUmb Solver

Element Quality Metrics for Higher-Order Bernstein Bézier Elements

The Development of a Navier-Stokes Flow Solver with Preconditioning Method on Unstructured Grids

Radial Basis Function-Generated Finite Differences (RBF-FD): New Opportunities for Applications in Scientific Computing

Fluent User Services Center

Adarsh Krishnamurthy (cs184-bb) Bela Stepanova (cs184-bs)

VOLCANIC DEFORMATION MODELLING: NUMERICAL BENCHMARKING WITH COMSOL

1 Exercise: Heat equation in 2-D with FE

Parallel Computation of Spherical Parameterizations for Mesh Analysis. Th. Athanasiadis and I. Fudos University of Ioannina, Greece

Performance and Software-Engineering Considerations for Massively Parallel Simulations

Explicit and Implicit Coupling Strategies for Overset Grids. Jörg Brunswig, Manuel Manzke, Thomas Rung

CS205b/CME306. Lecture 9

ALF USER GUIDE. Date: September

Numerical Methods for PDEs. SSC Workgroup Meetings Juan J. Alonso October 8, SSC Working Group Meetings, JJA 1

Matrix-free multi-gpu Implementation of Elliptic Solvers for strongly anisotropic PDEs

NOISE PROPAGATION FROM VIBRATING STRUCTURES

Numerical Algorithms

Debojyoti Ghosh. Adviser: Dr. James Baeder Alfred Gessow Rotorcraft Center Department of Aerospace Engineering

Handling Parallelisation in OpenFOAM

Discontinuous Galerkin Spectral Element Approximations for CFD

Computational Fluid Dynamics - Incompressible Flows

Parameterization with Manifolds

Higher Order Multigrid Algorithms for a 2D and 3D RANS-kω DG-Solver

PARALLEL METHODS FOR SOLVING PARTIAL DIFFERENTIAL EQUATIONS. Ioana Chiorean

2.7 Cloth Animation. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter 2 123

MESHLESS SOLUTION OF INCOMPRESSIBLE FLOW OVER BACKWARD-FACING STEP

3D Helmholtz Krylov Solver Preconditioned by a Shifted Laplace Multigrid Method on Multi-GPUs

Algorithms, System and Data Centre Optimisation for Energy Efficient HPC

Transcription:

Hierarchical Hybrid Grids IDK Summer School 2012 Björn Gmeiner, Ulrich Rüde July, 2012

Contents Mantle convection Hierarchical Hybrid Grids Smoothers Geometric approximation Performance modeling 2

Mantle convection 3

Simulation of seismic velocities Fig. by Schuberth and Moder - Simulation with TERRA (LMU) 4

Boussinesq model for mantle convection problems derived from the equations for balance of forces, conservation of mass and energy: T t (2ηɛ(u)) + p = ρ(t )g, u = 0, + u T (κ T ) = γ. u velocity p dynamic pressure T temperature ν viscosity of the material ɛ(u) = 1 2 ( u + ( u)t ) strain rate tensor ρ density κ, γ, g thermal conductivity, heat sources, gravity vector 5

Discretization Temporal for the temperature (explicit): Modified Euler BDF-2 scheme Spatial with FE for the Stokes system: Add an artificial compressibility term Choose a pair of FE spaces that satisfy the LBB condition: Q d q+1 xq q, (q 1) - Taylor-Hood elements Q d q+1 xp q, (q 0) - discontinuous elements 6

Weak form: Stokes equation (ɛ(ϕ u i ), 2ηɛ(u n h )) ( ϕu i, p n h ) = (ϕu i, ρ(t n h )g), (ϕ p l, un h ) = 0. In matrix notation (saddle point structure with zero diagonal block): ( ) ( ) ( ) A B T U n F n B 0 P n = 0 Pressure correction approach: ( ) ( ) ( ) A B T U n F n 0 BA 1 B T P n = 0 A ij = (ɛ(ϕ u i ), 2ηɛ(ϕu j )) B ij = (ϕ q i, ϕu j ) F n i = (ϕ u i, ρ(t n )g) 7

Boussinesq model for mantle convection problems T t Additional challanges: (2ηɛ(u)) + p = ρ(t )g, u = 0, + u T (κ T ) = γ. Viscosity η variations by four orders of magnitude in radial direction High main memory consumption due to the operators Operator evaluations of the Stokes equation are the performance critical parts of a (forward) simulation 8

Viscosity profiles (Fig. provided by M. Mohr) 9

Two-grid cycle (correction scheme) 10

Viscosity η variations by four orders of magnitude in radial direction: Adjust smoother Use matrix-dependent transfer operators Change coarse-grid operator High main memory consumption due to the operators: Matrix-free operator evaluations Operator evaluations of the Stokes equation are the performance critical parts of a (forward) simulation: Avoid indirect addressing by (semi-) structured meshes 11

Hierarchical Hybrid Grids 12

HHG - Combining finite element and multigrid methods FE mesh may be unstructured. What nodes to remove for coarsening? Not straightforward! Why not start from the coarse grid? The Hierarchical Hybrid Grids (HHG) concept Benjamin Bergen*: prototype Tobias Gradl: tuning, extensions and adaptivity * Dissertation in Erlangen, ISC award in 2005. Currently at Los Alamos Labs. 13

HHG mesh decomposition into primitives (2d-example) inner point ghost point volume edge memory representation geometric representation vertex 14

Regular refinement of a tetrahedral element 15

Smoothers 16

Operator: 15-point stencil in a refined tetrahedral element 17

Different Tetrahedral Shapes (a) Needle (b) Wedge (c) Spindle (d) Spade (e) Sliver (f) Cap Figure: Different classes of degenerated elements 18

Comparison between local Fourier analysis (LFA) and HHG Regular Needle Wedge Spindle Spade Sliver Cap LFA 0.250 0.982 0.786 0.980 0.590 0.896 0.872 HHG 0.195 0.982 0.740 0.980 0.500 0.889 0.739 edge ratio 1.0 10.0 4.0 10.0 1.67 1.40 1.71 max angle 60 0 87.1 0 82.8 0 87.1 0 112.9 0 88.9 0 117.2 0 min angle 60 0 5.7 0 14.3 0 5.7 0 33.6 0 45.6 0 31.4 0 Table: LFA smoothing factors, µ ν1+ν2, and two grid convergence factors, ρ, of the four-color smoother for different tetrahedra Results of the LFA are provided by Francisco Gaspar 19

(a) Wedge (b) Spade (a) ν(1, 1) four-color Gauss-Seidel: 0.786 ν(1, 1) line-wise smoothing: 0.122 (b) ν(1, 1) four-color Gauss-Seidel: 0.590 ν(1, 1) line-wise smoothing: 0.230 ν(1, 0) zebra-plane smoothing: 0.105 20

(c) Needle (d) Spindle (c) ν(1, 1) four-color Gauss-Seidel: 0.982 ν(1, 0) lex. plane smoothing: 0.330 ν(1, 0) zebra-plane smoothing: 0.121 (d) ν(1, 1) four-color Gauss-Seidel: 0.980 ν(1, 0) zebra-plane smoothing: 0.124 21

(e) Sliver (f) Cap (e), (f)??? 22

(e) Sliver (f) Cap But: Acc. to Shewchuk: good tetrahedra are of two types: those that are not flat, and those that can grow arbitrarily flat without having a large planar or dihedral angle. The bad tetrahedra have error bounds that explode, and a dihedral angle or a planar angle that approaches 180 0, as they are flattened. J.R. Shewchuk. What Is a Good Linear Finite Element? - Interpolation, Conditioning, Anisotropy, and Quality Measures. In Proc. of the 11th International Meshing Roundtable (2002). 23

putting blocks together... (coarsest mesh is generated by Gmsh) 24

Locally adaptive smoothing We can predict the convergence rate for the single blocks of our domain by LFA. How to improve the convergence of the whole domain? 25

Locally adaptive smoothing We can predict the convergence rate for the single blocks of our domain by LFA. How to improve the convergence of the whole domain? Choose for each block: Different smoothers Optimal relaxation parameters Suitable number of pre-/post smoothing steps 25

Locally adaptive smoothing We can predict the convergence rate for the single blocks of our domain by LFA. How to improve the convergence of the whole domain? Choose for each block part: Different smoothers Optimal relaxation parameters Suitable number of pre-/post smoothing steps 26

Chaotic smoothing (mixed smoothers) Figure: Domain consisting of wedge, regular and needle tetrahedral types Shape edge ratio max angle min angle smoothing ν 1, ν 2 ρ Needle 10 87 0 5.7 0 plane-wise 1,0 0.12 Regular 1 60 0 60 0 4-color 2,1 0.09 Wedge 5 110 0 11 0 line-wise 2,1 0.09 Table: Convergence factor for the single blocks Global convergence factor for the whole domain: 0.17 - using damping parameter at the interface: 0.14 27

Mantle convection Hierarchical Hybrid Grids Smoothers Geometric approximation Performance modeling Anisotropic smoothing (a) Connection in isotropic direction (b) Connection in anisotropic direction Gauss-Seidel: single region, line-wise smoother: 0.934 0.072 (a) line-wise smoother: 0.094 (b) line-wise smoother: 0.847 28

Locally adaptive smoothing We can predict the convergence rate for the single blocks of our domain by LFA. How to improve the convergence of the whole domain? Choose for each block: Different smoothers Optimal relaxation parameters Suitable number of pre-/post smoothing steps 29

Redistributing the number of smoothing steps Elements colored by convergence rates ν(pre-smoothing steps, post-smoothing steps) Keeping the total number of smoothing steps constant 30

Model Domain: Box in a Half Sphere Coarsest mesh: 293 elements 8 times refined 1.02 10 8 unknowns lex. Gauss-Seidel β min = 0.241, β avg = 0.537 Please note that this is a serial run! 31

Model Domain: Box in a Half Sphere We set up two conditions: 1. The required amount of communication is the same for all following methods. We are exchanging eight times all ghost boundary points at the interfaces during the smoothing procedure. 2. The total smoothing workload has to be comparable to the unoptimized version in the solving phase. The additional time for the setup phase strongly depends on the implementation of a LFA evaluation in some of the following smoothing strategies. 32

Local Adaptive Smoothing Smoothing strategy W-cycle V-cycle ω Unoptimized 0.64 0.65 1.0 Exact block-wise solving 0.11 0.18 - Adaptive smoothing steps 0.51 0.53 1.0 + Additional interface smoothing 0.34 0.34 1.0 Full damping 0.59 0.56 1.15 Interior damping 0.44 0.49 1.55 / 1.0 + Additional interface smoothing 0.30 0.30 1.55 / 1.0 Adaptive interior damping (simplex) 0.46 0.51 variable + Additional interface smoothing 0.31 0.34 variable Combined Methods 0.15 0.19 1.55 / 1.0 Table: Measured convergence factors for different smoothing strategies 33

Geometric approximation 34

Spherical refinement of an icosahedron (0) 35

Discretization with prismatic elements 36

Decomposition of prisms into three tetrahedra d e e a b b f f h c c g 37

Non-conforming decomposition of prisms into tetrahedra d e e a b b f f h c c g 38

Spherical refinement of an icosahedron (0) 39

Spherical refinement of an icosahedron (1) 40

Spherical refinement of an icosahedron (2) 41

Spherical refinement of an icosahedron (3) 42

Spherical refinement of an icosahedron (4) 43

Intersection through the spherical refinement - Each tetrahedral element ( 132k) was assigned to one Jugene compute core and further refined without curved boundaries. 44

Performance modeling 45

Storage of the variable viscosity field Since we store the viscosity like a variable, there are different ways to store them for linear finite elements: Element-wise More storage necessary for tetrahedral elements Clear location of a jump when dealing with jumping coefficients Node-wise Requires averaging No additional data layout necessary for structured tetrahedral elements 46

Viscosity averaging for a 15-point stencil e2 e3 e1 Coefficient averaging for each element Naive: 24 elements 3 = 72 adds By partial sums: 16 edges + 24 elements = 40 adds 47

Parallel generation of the coarsest mesh 1. Build up a local mesh on each process Loop over all elements in the mesh file: Read own macro-volume Loop over all elements in the mesh file: Read neighbouring macro-volumes 2. Map all other macro-primitives to processes Assign these primitives to the same process as the adjacent element with with the largest ID 3. Assign IDs to edge and face primitives Build tuples for all macro-edges and macro-faces (with one tuple entry per vertex ID) Sort the lists of tuples Assign IDs according to the index of the tuple in the sorted list 48

Mantle convection Hierarchical Hybrid Grids Smoothers Geometric approximation Performance modeling Weak and strong scaling on Jugene A triangle has a resolution of 6km2 ; hr = 3km in radial direction. 49

Seconds per V-cycle Terabytes Strong scaling on Jugene 30 7 25 20 15 10 5 0 x1.93 x3.80 x1.85 x3.43 20 480 40 960 81 920 Number of compute cores 6 5 4 3 2 1 0 Memory - constant viscosity Strong scaling - constant viscosity Memory - variable viscosity Strong scaling - variable viscosity 50

Performance modeling and analysis V-cycle 1. Computation: stencil evaluations 2. Network effects: bandwidth, latency, reductions including all multigrid levels and conjugate gradient on the coarsest mesh Assumptions one inner point on the coarsest mesh per processor hardware information exclusively extracted from IBM System Blue Gene Solution: Blue Gene/P Application Development 51

Time [s] BlueGene/P performance model (different setup) 0,35 0,30 0,25 0,20 0,15 0,10 0,05 0,00 954 5646 37158 361344 2890752 #Cores Bandwidth CG Computation Measurement 52

Summary Introduction into challanges of mantle convection simulations Smoothing strategies for anisotropic elements and varying coefficients Presentation of a conforming tetrahedral decomposition of an icosahedral mesh Performance evalution and modeling Current work Multigrid implementation & evaluation of (jumping) variable coefficients for HHG Preparations for SuperMUC 53

Thank you for your attention! 54

Benchmarks on Blue Gene/P in Jülich (Jugene) Compute node: 4-way SMP processor Processortype: 32-bit PowerPC 450 core 850 MHz Cores: 294 912 Overall peak performance: 1 Petaflops Main memory: 2 Gbytes per node (aggregate 144 TB) 55